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GENES THAT ENCODE A SURFACE PROTEIN OF P. CARINII 



BACKGROUND OF THE INVENTION 

Field of the Invention 

The present invention relates to the cloning of the 
major surface antigen of Pneumocy stis carinii , a life 
threatening opportunistic pathogen in HIV-infected patients 
5 and the use of that antigen as a vaccine to prevent or 
control P. carinii infection. 

Description of Related Art 

The AIDS epidemic was heralded by the occurrence of 
Pneumocvs t i s car ini i pneumonia, a life -threatening 

10 opportunistic disease, in patients with no previously 
identified immunodeficiency (1,2). Subsequently, the 
number of cases of P. carinii pneumonia increased 
dramatically as human immunodeficiency virus (HIV) 
infection became wide -spread and the virus progressively 

15 impaired the immune system of infected patients. Over the 

past 5 years, important advances in the diagnosis, 
treatment, and prevention of P. carinii pneumonia have 
resulted in a decline in the frequency of P. carinii 
pneumonia, as well as an improvement in survival (3) - 

20 Despite these important clinical advances, the 

immunopathogenesis of P. carinii pneumonia is poorly 
understood. Although long considered a protozoan, recent 
molecular biologic studies have shown. P. carinii to be a 
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member of the fungi (4-7) . The major surface antigen of 
carinii is a mannose-rich glycoprotein of approximately 
110,000 to 120 f 000 MW under reducing and denaturing 
conditions, with a native M r (molecular weight) of over 
5 300,000 (8-12). P. carinii isolated from different 
mammalian species contain similar but antigenically 
distinct proteins (8,9,13). The present inventors have 
recently purified and characterized the major surface 
protein of both rat (gpl!6) and human (gp95) P. carinii , 

10 and have demonstrated that about 10% of the M r is accounted 

for by N- linked carbohydrates, with distinct carbohydrate 
profiles for the two proteins (8) . Recent studies have 
suggested that gpllG is important in drganism-host cell 
binding, possibly through interactions with fibronectin 

15 (14), mannose -binding protein (15), or surfactant protein 

A (16) . Passive immunization studies with a monoclonal 
antibody directed against a conserved epitope of this 
antigen have demonstrated partial protection against P^ 
carinii pneumonia in rats and ferrets (17) . The major 

20 surface antigen thus appears to play a role not only in 

host -organism interactions, but also in host defense 
mechanisms . 

SUMMARY OF THE INVENTION 

Based on Southern blot studies using chromosomal or 
25 restricted DNA, the major surface glycoproteins of P^. 

carinii have been found to be the products of a multicopy 
family of genes. The predicted protein has a MW of 
approximately 123,000, is relatively rich in cysteine 
residues (5.5%) that are very strongly conserved, and 
30 contains a well -conserved hydrophobic region at the carboxy 

terminus. The presence of multiple related genes encoding 
the major surface glycoprotein of P. carinii suggests that 
antigenic variation is an important mechanism for evading 
host defenses. 

35 The present inventors have isolated and sequenced the 

DNA (and deduced the corresponding amino acid sequences) 
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for seven unique genes, each of which encodes a major 
surface glycoprotein of rat P. carinii . These genes are 

related to the corresponding genes in human £. cacinii-- 

Seven cDNA clones, PC3 , PCS , PC14, GP3 , GP22 # GP46 and 
5 GP14, encoding gpll6, were isolated and the sequences 
obtained suggest that gp!16 is, in fact, a heterogeneous 
mixture of proteins encoded by multiple related genes. 
This should enable the preparation of the corresponding DNA 
in P. carinii which infects humans and the corresponding 

10 polypeptides^, thus permitting the development of a vaccine 
based on the major surface antigen of P. carinii strains 
which infect P. carinii . 

Accordingly, it is an object of the present invention 
to provide and isolate a DNA molecule encoding a mammalian 

15 Pneumocystis carinii major surface glycoprotein or allelic 

variations thereof. It is also an object of the invention 
to provide a DNA molecule encoding the gene for the major 
surface glycoprotein of P. carinii as shown in Figure lb. 
It is a further object of the invention to provide a DNA 

20 molecule encoding all or a portion of the gene for the 

major surface glycoprotein of P. carinii in a cDNA clone 
including the clones of PC3, PCS , PC14, GP3, GP22, GP46 and 
GP14 . 

It is an additional obj-ect.of the invention to provide 
25 p||^^$#@liles which encode human ,..p , carinii major ■aft*rface f 
43^^p£6€ : &in or allelic variations thereof. 

It is another object of the invention to provide a 
method of obtaining a DNA molecule encoding a mammalian P_^ 
carinii major surface glycoprotein which comprises 

30 screening a cDNA expression library of P. carinii with an 

antibody to said major surface glycoprotein to identify 
positive clones encoding gpll6 and using at least one of 
said clones or an oligonucleotide probe based on said 
clones to reveal the presence of multiple genes encoding 

3 5 for said major surface glycoprotein. 
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It is a further object of the invention to provide a 
mammalian Pneumocystis carinii major surface glycoprotein 
having the amino acid sequence as shown in Figure lb. 

It is an additional object. of the invention to provide 
5 a mammalian Pneumocystis carinii major surface glycoprotein 

produced from the expression of a DNA sequence which is a 
composite (or consensus sequence) of multiple genes which 
encode said major surface glycoproteins. 

It is a further object of the invention to provide k 

10 human Pneumocvs t i s car ini i major surface glycoprotein 
produced from the expression of a DNA sequence which is--** 
composite (or consensus sequence) of multiple genes which, 
encode said major surface glycoprotein. 

It is another object of the invention to provide a 

15 vaccine comprising a therapeutically effective amount of a 
mammaian Pneumocystis carinii major surface glycoprotein or 
a polypeptide derived therefrom capable of eliciting an 
immune response to said glycoprotein, and pharmaceutical^ 
acceptable parenteral vehicle. 

20 It is also an object of the invention to provide a DNA 

molecule encoding a mammalian Pneumocvs t i s car ini i major 
surface glycoprotein which is a composite (or consensus 
sequences) of multiple genes which encode said major 
surface glycoprotein. 

25 Further scope of the applicability of the present 

invention will become apparent from the detailed 
description and drawings provided below. However, it 
should be understood that the detailed description and 
specific examples, while indicating preferred embodiments 

30 of the invention, are given by way of illustration only 
since various changes and modifications within the spirit 
and scope of the invention will become apparent to those 
skilled in the art from this detailed description. 



35 



BRIEF DESCRIPTION OF THE DRAWINGS 

The above and other objects, features, and advantages 
of the present invention will be better understood from the 
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following detailed descriptions taken in conjunction with 
the accompanying drawings, all of which are given by way of 
illustration only, and are not limitative of the present 
invention, in which: 
5 Figure 1A-1B. Alignment of the deduced amino acid 

sequences SEQ ID NO: 1 through SEQ ID NO: 7 represent 7 
homologous clones encoding the major surface glycoprotein 
of rat P. carinii . Alignment was performed by the Clustal 
program of PC-Gene (IntelliGenetics, Inc.). Cysteine 

10 residues are identified in bold. Potential glycosylation 

sites are underlined. The peptides sequenced directly are 
shown above the alignment. An * indicates that a residue 
is conserved among all clones that overlap in that region. 
Figure 2 . Immunoblots demonstrating reactivity of 

15 anti-peptide antibodies with the major surface glycoprotein 

of rat P. carinii . Lanes 1, 3, 6, and 8, whole -organism 
extract; lanes 2, 4, 5, and 7, lyticase-solubilized 
proteins. Lanes 1, 2, 5, and 6, pre-immune sera (1:100); 
lanes 3 and 4, hyperimmune serum (1:100) following 

20 immunization with GVl^^; lanes 7 and 8, hyperimmune serum 

(1:100) following immunization with PC5 36 5.3 79 . Reactivity 
specifically with the major surface glycoprotein 
(M r =116,0Q0) is seen with both hyperimmune sera. Lyticase 
treatment solubilizes the major surface glycoprotein, but 

25 results in a loss in apparent M, of about 10%. Samples were 

run on a gradient gel (8% to 16%) prior to transfer to 
nitrocellulose. Migration of molecular weight markers is 
indicated on the left. 

Figure 3A. Southern blot of P. carinii DNA (20 

30 /ig/lane) digested with Nde 1 (first lane of each pair) and 

Eco Rl (second lane of each pair) and subsequently probed 
with MSG1 (common sequence) , MSG 2 (GP3-specif ic) , MSG3 
(GP14-specif ic) , or DHPS1 . Standards in kilobases are 
indicated on the left . P. carinii DNA was obtained from a 

35 single infected rat. None of the regions from which the 

oligonucleotides were derived contain Eco Rl or Nde 1 
sites. All oligonucleotides were labeled at the same time, 
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and approximately equal numbers of counts were added for 
each probe. The first two lanes were exposed for 4 hours, 
and the remaining lanes for 48 hours. The presence of 
multiple bands in the first two lanes demonstrates that 
5 multiple copies of these genes are present. Fewer bands 
are seen with the oligonucleotides specific for GP3 or 
GP14, but all bands correspond to those seen with the 
common oligonucleotide. No hybridization with MSG1 was 
seen with rat DNA (blot not shown). Hybridization with 
10 DHPS1, derived from P. carinii dihydropteroate synthase, 

demonstrates the intensity of reactivity with a presumed 

single -copy gene. 

Figure 3B. Southern blot of P. carinii chromosomes 
from 5 isolates (lanes 1 to 5) and ^rrhsromyrps cerevisiae 

15 chromosomes (lane 6) separated by transverse alternating 

field electrophoresis (28) and probed with PC5, 
demonstrating hybridization with multiple — saEiaii 
chromosomes in all isolates. MW, based on S. cerevisiae 
chromosomes, is indicated on the right. 

20 Figure 3C. Northern blot of total RNA extracted from 

3T3 cells (10 ^9- lane 1) or 5 P. carinii isolates (5-10 
ixg, lanes 2 to 6) probed with PC5. Hybridization to an 
approximately 4000 bp transcript is seen in P. carinii 
lanes. Migration of rRNA is indicated on the right. 

25 DETAILED DESCRIPTI ON OF THE INVENTION 

The following detailed description of the invention is 
provided to aid those skilled in the art in practicing the 
present invention. Even so, the following detailed 
description of the invention should not be construed to 
30 unduly limit the present invention, as modifications and 
variations in the embodiments herein discussed may be made 
by those of ordinary skill in the art without departing 
from the spirit or scope of the present inventive 
discovery. 
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The contents of each of the literature citations in 
the present application are herein incorporated by 
reference in their entirety. 

The nucleotide sequence of clone PC3 (SEQ ID NO: 1) 
5 encodes for a portion of the coding sequence for the major 

surface glycoprotein of rat P. carinii . 

The nucleotide sequence of clone PCS (SEQ ID NO: 2) 
encodes for a portion of the coding sequence for the major 
surface glycoprotein of rat P. carinii . 
10 The nucleotide sequence of clone PC14 (SEQ ID NO: 3) 

encodes for a portion of the coding sequence for the major 
surface glycoprotein of rat P. carinii . 

The nucleotide sequence of clone GP3 (SEQ ID NO: 4) 
encodes for a protein similar to the original gpll6 clones 
15 and having a molecular weight of 104,048. 

The nucleotide sequence of clone GP46 (SEQ ID NO: 5) 
encodes for a portion of the major surface glycoprotein of 
r at P. carinii . 

The nucleotide sequence of clone GP22 (SEQ ID NO: 6) 
20 encodes for a portion of the major surface glycoprotein of 
rat P. carinii. 

The nucleotide sequence of clone GP14 (SEQ ID NO: 7) 
encodes for a portion of the major surface glycoprotein of 
rat P. carinii , 

25 . DNA (SEQ ID NO: 8) and inferred amino acid sequence 

(SEQ ID NO: 9) illustrate one gene of the major surface 
glycoprotein of P. carinii . The DNA sequence, which was 
determined from both strands, is a composite of the 
original GP3 clone (SEQ ID NO: 4) (nucleotides 626 to 3521) 

3 0 and the 5' fragment (1 to 722) that was determined by PCR. 

Primers used in PCR to identify the 5' end of the sequence 
are underlined once, and the 5' end of the original clone 
is underlined twice. The 5' fragment was missing the 
first nine nucleotides of the 5' primer. The 

35 polyadenylation signal is shown in bold. 
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MATERIALS AND METHODS 

Materials 

Restriction enzymes were obtained from New England 
Biolabs (Beverly, MA) . Other enzymes or kits were obtained 
5 from Stratagene (La Jolla, CA) , Boehringer Mannheim 

(Indianapolis, IN) , or InVitrogen (San Diego, CA) . 
Polymerase chain reaction (PCR 1 ) studies were performed with 
a DNA thermocycler (Perkin-Elmer/Cetus) using reagents 
obtained from Perkin-Elmer/Cetus. Radiolabeled chemicals 

10 were obtained from New England Nuclear-DuPont (Boston, MA) . 

Sequenase 2 was obtained from United States Biochemical 
(Cleveland, OH) . Oligonucleotides were synthesized on a 
Cyclone-Plus DNA synthesizer (Milligen Biosearch, 
Burlington, MA) using reagents obtained from Milligen. 

15 Hybond-N+ was obtained from Amersham (Chicago) . 

P. carinii organisms . Organisms were obtained from 
immunosuppressed rats and partially purified by Ficoll- 
Hypaque density gradient centrifugation as described (18) . 
P > carinii libraries . Construction of a £. carinii 

20 cDNA library in X ZAP has been described (4) . A second 

library was constructed in a similar fashion using oligo- 
dT- selected mRNA and subcloning into a modified X ZAP 
vector (19) , YcDEll, which contained sequences necessary 
for Saccharomvces cerevisiae replication and expression 

25 (Edman, J.C., unpublished observations). Both libraries 

were constructed from RNA pooled from three £. carinii 

preparations. 

General Methods 

Screening of libraries . Antibody screening was 

3 0 performed by described techniques (20) on approximately 

50,000 phage following induction with isopropyl-j3-D- 
thiogalactopyranoside (10 mM) using serum (1:1000) from a 
rat immunized with rat P. carinii (18) . Positive clones 
were plaque purified, and clones encoding .the major surface 

35 glycoprotein were identified by the antibody elution 

technique (21). Briefly, approximately 5,000 phage were 
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plated with BB4 cells on NZCYM agar; after 3-4 hours growth 
at 42°C # plates -were overlaid with nitrocellulose that had 
been soaked in lOmM isopropyl-/J-D-thiogalactopyranoside, 
and incubated overnight. Filters were blocked, incubated 
5 overnight with hyperimmune rat serum (1:1000), and washed. 

Reactive antibodies were eluted using 5 mM glycine-HCI, pH 
2.3, 150 mM NaCl, 0.5% Triton-X 100, and 100 fig/ml bovine 
serum, albumin. After neutralization, eluted antibodies 
reactive with the major surface glycoprotein were 

10 identified by the immunoblot technique using P. carinii 
antigens as described (18) . For screening with DNA, probes 
were labeled with [cy- 32 P] -dCTP using the random priming 
method (22) . Hybridization was performed overnight at 65°C 
in 6x' SSPE/l%SDS/10x Denhardt's solution (Ix SSPE is 0.15 

15 M NaCl, lOmM NaH 2 P0 4 , 1 mM EDTA-Na 2/ pH 7.4; Ix Denhardt's is 

0.02% polyvinylpyrrolidone, 0.02% Ficoll, 0.02% bovine 
serum albumin) . Filters were washed at 65°C in 
0.5xSSPE/0.1% SDS. Positive clones were plaque purified, 
and plasmids (pBluescript plus insert) were rescued 

20 according to the manufacturer's instructions (Stratagene) 

(19) . Inserts were sequenced directly from plasmid using 
the Sanger dideoxy chain termination method (23) either in 
the inventors' laboratory using the Sequenase 2.0 kit or 
commercially (Lofstrand, Gaithersburg, MD) . 

25 ^rolvmoraaa chain reaction (PCR) 3 The 5' region of GP3 

was determined by PCR. To identify the 5' end of the mRNA, 
oligonucleotide JK58, complementary to positions 306 to 325 
of PC3 (SEQ ID NO: 10) (TTAACCGGCCGTGCCATTGC) , which 
includes the putative initiation codon, was used as a 

30 template for reverse transcription (24) , after which the 

cDNA was tailed with terminal transferase and dGTP, 
amplified by PCR as described (25) , using primer JK58 and 
a 1:10 ratio of modified primers ANC SEQ ID NO: 11 
(GACTGCATGCGGAAGCTTGGATCCCCCCCCCCCCCC) and AN (SEQ ID NO: 

35 12) (GACTGCATGCGGAAGCTTGGATCC) , subcloned into pCRlOOO 

(Invitrogen) , and sequenced. The region 5' to GP3 was then 
determined by reverse transcription followed by PCR (24), 
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using a 5' primer corresponding to the previously 
determined first 20 bases of the mRNA (SEQ ID NO: 13) 
(TTTTTCTAATAGACGATATG) , and a 3' primer complementary to 
positions 77 to 96 of GP3 (SEQ ID NO: 14) 
5 (GATCTCCACATGTTTTAGCA) , subcloning as above and sequencing. 

Southern and Northern blots . For Southern blots, P. 
carinii DNA (20 /xg/lane) digested with Eco Rl, or Nde 1 was 
probed with the following oligonucleotides that had been 
labeled with [y- 32 P]ATP using T4 polynucleotide kinase (26) : 
1 0 MSG1 : GCAGAACTTGAGTCGGAATGTTT [ C , T] TATTTA ( SEQ ID NO : 15); 

MSG2:AAAATATCTTCCACGATGTCTTTATCCTAA (SEQ ID NO: 16); 
MSG3 : GAAAATAAAGATAAGAGATACCTTCCAAAG ( SEQ ID NO : 17); and 
DHPS1 : 

TTGATCACGATATTAAGCCAGTTTTGCCAT (SEQ ID NO: 18). MSG1, 

15 which corresponds to nucleotides 1346 to 1375 of GP3 , is 
well conserved among the overlapping clones. MSG2 (1573 to 
1602 of GP3) and MSG3 (223 to 252 of GP14 ) are based on 
regions of PG3 and GP14 that are poorly conserved in other 
clones. DHPS1 is complementary to 1897 to 1926 of the L 

20 carinii fas gene, which encodes P. carinii dihydropteroate 
synthase (27) . None of the oligonucleotides contained Eco 

Rl or Nde 1 sites. For pulse- field gels, P^ carinii 

chromosomes were separated by transverse alternating field 
electrophoresis as described (28) and probed with [ 32 P] - 

25 labeled PCS or MSG1. For northern blots, RNA was extracted 

using an RNA isolation kit (Stratagene) according to the 
manufacturer's instructions; 5-10 \ig total RNA was 
separated by f ormaldehyde/agarose gel electrophoresis and 
probed with [ 32 P] -labeled PCS . All blots were transferred 

30 to Hybond-N+. Blots probed with PCS were prehybridized 

overnight in 6x SSPE/1%SDS/I0x Denhardt's solution, 
hybridized overnight at 65°C with [ 32 P] - labeled PCS (55°C for 
chromosome blots) , then washed twice for 5 min. at room 
temperature in 2xSSPE/0.1% SDS followed by two washes for 

35 20 min. at the hybridization temperature in 

0 . SxSSPE/0 . 1%SDS or, for chromosomal blots, 
0.1xSSPE/0.1%/SDS. Blots probed with oligonucleotides were 
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prehybridized overnight in 2x SSPE/0 . 5%SDS/5x Denhardt's 
solution/0.5 iig/ml sonicated and denatured salmon sperm 
DNA, hybridized overnight at 60°C with [ 32 P] -labeled 
oligonucleotide, and washed three times at 60°C for 30 
5 minutes each in 2xSSPE/0.5% SDS. 

Peptide sequencing . Peptide sequencing was performed 
(Harvard /iChem, Cambridge, MA) on peptides of gpll6 
following treatment of purified gpll6 (8) with 
endoproteinase LysC and separation by narrow-bore reverse 

10 phase HPLC using previously described techniques- (29) . 

Peptide 1 was selected for sequencing by analyzing the 
predicted peptide sequences originating between the first 
two predicted methionines as follows: first, peptide 
retention prediction suggested such peptides would be 

15 retained predominantly in the first quarter of the 

chroma togram. Second, greater than 70% of the sequences 
lacked tryptophan or tyrosine and, thus, would be deficient 
in UV absorbance at 277 nM. On the basis of these two 
criteria, appropriate fractions were screened by 

20 electrospray ionization mass spectrometric analysis for a 

molecular mass matching a sequence from the desired region 
(29). 

Ant i -peptide antibodies . The following peptides were 
synthesized by Peninsula Laboratories (Belmont, CA) : GVl^ 

25 460 (SEQ. I.D. NO.:9, amino acids 446-4 60) (Glu-Leu-Lys-Gly- 

Lys-Leu-Gly-His-Val-Arg-Phe-Tyr-Ser-Asp-Pro) , - which 
corresponds to amino acid residues 446-460 of GP3 and 453 
to 467 of PC3; and PC5 365 „ 379 (SEQ. I.D. NO. :19) (Glu-Leu-Arg- 
Gly-Asn-Leu-Gly-Leu-Val-Arg-Phe-Tyr-Ser-Asp-Pro) , which 

30 corresponds to 365 to 379 of PCS. Peptides (10 mg) were 

commercially coupled (Peninsula Laboratories) to KLH (50 
mg) (30), and two rabbits were immunized (Lofstrand) with 
0.5 to 1.25 mg of each peptide conjugate every two weeks 
for 10 weeks, using complete (first dose) or incomplete 

35 (remaining doses) Freund's adjuvant. Immunoblots against 

whole organism extracts or lyticase-solubilized gpl!6 were 
performed as previously described (8) . 
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Computer analysis . DNA and protein analyses were 
performed using either the PC-Gene (Intelligenetics) or 
MacVector (IBI) analysis programs. 

Molecular weight (M,.) Determinations . The M r for 
5 purified major surface glycoprotein was "determined for the 

native protein by a sizing column used on an HPLC, and for 
the reduced and denaturated protein by SDS-polyacrylamide 
gel electrophoresis. For the proteins encoded by the 
cloned genes, the M r was determined by the computer program 

10 MacVector, based on the amino acid sequences of the 

predicted proteins. 

Gene composite . The composite was generated by 
alignment of the sequence of the original GP3 clone and the 
sequence of the PCR-generated fragment corresponding to the 

15 5' end of the gene. Nucleotides 626 to 722 of the PCR 

fragment were found to be identical to nucleotides 1 to 96 
of the original GP3 clone. This allowed appending 
nucleotides 1 to 625 of the PCR fragment to the 5' end of 
GP3, to generate the composite full-length clone. 

20 Consensus sequence synthesized gene . A consensus 

sequence can be generated by computer alignment of the 
proteins encoded by each gene from the multiple clones. A 
clone containing a synthetic construction and representing 
the consensus sequences, or regions that contain some of 

25 the consensus sequences, could then be derived by molecular 

biologic techniques, for example by replacing regions in 
one of the clones with consensus sequences, or by using 
site-directed mutagenesis (39) . 

RESULTS 

30 Identification of genes encoding the major surface 

glycoprotein . Multiple clones were identified by 

immunoscreening a rat P. carinix^ cDNA library using rat 
serum generated against whole rat P. carinii (18) . Clones 
reactive with polyclonal serum were evaluated by the 

35 antibody elution technique (21) to identify those 

potentially encoding for gp!16. These clones cross- 
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hybridized by Southern hybridization; however, it was not 
possible to align the clones by restriction mapping. Three 
of these clones (PC3, PCS , and PC14) were sequenced in 
their entirety and contained open reading frames encoding 
5 three closely related but distinct proteins. Although none 
of the clones contained the complete coding sequence, 
overlapping regions allowed alignment of the three clones 
(Figure 1A-1B) and generation of a putative composite* 
complete coding sequence that encoded for a protein of 

10 approximately 122,000 MW. One of these clones (PCS) was 
used to screen a second cDNA library that had been 
constructed in a modified lambda ZAP vector, YcDEll. 
Approximately 1% of the clones hybridized to PCS,, Four of 
these clones GP3 , GP22, GP46 and GP14 were sequenced, and 

15 all contained open reading frames encoding proteins highly 
similar to the original gpl!6 clones (Figure 1A-1D) . The 
clone with the largest insert (GP3) , 2869 bases plus a 
poly-A tail, has an open reading frame encoding for a 
protein of 104,048 MW. Based on the sequences in the 

20 composite protein, GP3 also appeared to be incomplete at 

the 5' end; PCR was utilized to determine the full sequence 
of this gene. The 5 1 end of the message was identified by 
anchored PCR (24) using primer JK58, which spanned the 
putative start codon of the composite protein. The 

25 intervening region was determined by reverse transcription 

followed by PCR, using primers spanning the 5' end to base 
722 in GP3. A single clone was identified that had an 
identical sequence to the first 76 bases of GP3 . The 
complete, composite cDNA contained an open reading fr§me 

30 encoding a protein of 122,997 MW. 

To demonstrate unambiguously that this cDNA encoded 
the major surface glycoprotein, fragments of purified gp!16 
were sequenced (29) . Although the amino terminus was 
blocked, the sequence of an endoproteinase LysC-generated 

35 18 amino acid fragment was obtained. This sequence is 

identical to amino acids 423 to 44 0 in the deducted PCS 
protein sequence, and is highly conserved in the other six 
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clones. A second sequenced peptide, identical to residues 
365 to 378 of PCS , is much less well conserved among the 
other clones. An additional peptide (peptide 1),, identical 
to residues 49 to 57 of PCS, was sequenced to show that the 
5 protein began -with the methionine at position 1, rather 

than 165 (which would result in a protein of approximately 
104,000 MW) . 

Peptides based on regions in GP3 and PCS were used to 
generate antibodies in rabbits. By immunoblot, these anti- 

10 peptide antibodies were shown to react with intact as well 
as lyticase-solubilized gpli6 (See Figure 2). 

Multiple genes encode the manor surfac e glycoprotein. 
Antigenic variability of surface proteins is an important 
mechanism for evading host defenses in a number of 

15 organisms (e.g., trypanosome and borrelia species) (31,32) . 

Upon the identification of multiple clones encoding for a 
family of closely related surface proteins, the presertt 
inventors theorized that antigenic variability of surface 
proffeins may be important to P. calrihii . * Heterogeneity of 

20 genes for gpll6 may represent the occurrence of multiple 

alleles for a single-copy gene, multiple genes encoding for 
a family of related proteins, splicing, or a combination of 
these factors. 

Southern hybridization experiments of P. carinii DNA 

25 treated with Nde 1 or Eco Rl and probed with either a 

conserved 30-mer oligonucleotide (MSG1) or PCS (blot not 
shown) t revealed multiple bands (Figure 3A) , strongly 
arguing for the presence of multiple genes. When 
oligonucleotides specific for GP3 or GP14 were utilized in 

30 Southern hybridization studies, fewer bands were seen, and 

the bands were different for the two probes, although all 
bands detected with these probes were also seen when 
probing with MSG1. The reactivity of a given band with 
MSG1 was consistently greater than with the specific 

35 oligonucleotides (Figure 3A) despite the fact that all 

probes had approximately the same specific activity. From 
these experiments, the inventors concluded that multiple 
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similar genes encoding the major surface glycoprotein were 
present in those bands, but the regions corresponding to 
the specific oligonucleotides were poorly conserved in many 
of these genes. Hybridization to blots of pulse- field gel- 
5 separated P. carinii chromosomes (28) showed the presence 
of gpll6 sequences on multiple chromosomes (Figure 3B) , 
consistent with multiple genes per P. carinii genome. 
Single copy genes have been previously demonstrated to 
hybridize to a unique chromosome in all P. carinii isolates 

10 (28) . Northern hybridization of P. carinii RNA with PCS 

revealed a single band of approximately 4,000 bases (Figure 
3C) , demonstrating that if multiple transcripts are made, 
they are similar in size. Since P. carinii cannot be grown 
consistently in vitro, at present it is impossible to clone 

15 single, organisms to further clarify these issues. 

Analysis of the coding regions . The genes encoding 
gpll6 are rich in adenosine and thymidine residues (63% in 
GP3) , and is 69% adenosine or thymidine in the third codon 
position of GP3 similar to the other coding sequences of 

20 P. carinii that have been identified to date (4,5). 

Nucleotide variability among the clones is not located 
primarily in the wobble position of the codon: among the 
437 nucleotides that differ in PC14 and GP3 , for example, 
36% are in the first, 30% in the second, and 34% in the 

25 third codon position. GP3 has a consensus polyadenylation 

signal (AATAAA) at nucleotides 3470-3475. 

Analysis of the coding sequences shows that of 135 
amino acid residues common to all seven clones, only 60 
(44%) are identical in all clones, although conservation is 

30 higher among pairs of clones. GP46 and GP14 are identical 

through the first 227 amino acid residues, but subsequently 
diverge. The cysteine content (5.5%) of the complete 
protein is relatively high with a very strong conversion of 
cysteine residues: of 267 residues present in the seven 

35 clones, only one is not conserved, and this results from a 

16-base frame-shift at nucleotide positions 1045 to 1060 of 
GP3 compared to PC3 and PCS. The cysteines are 
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concentrated primarily in three regions: 37 to 243, 329 to 
758, and 914 to 941 of GP3 . In these regions the cysteine 
residues do not occur at random, but are most often 
separated from another cysteine by six (20 occurrences) or 
seven (six occurrences) amino acids. There is no 
predictable pattern to the intervening amino acids. The 
conservation of cysteines, together with their repetitive 
nature, suggests the occurrence of a repetitive secondary 
structure, such as loops formed by intramolecular disulfide 
bonds, that may be functionally important. There is a 
poorly conserved region rich in proline and glycine 
residues between residues 817 and 870 of GP3, and a region 
rich in threonine and serine residues near the carboxy 
terminus (953-1052 of GP3) . 
15 The present inventors and others (8,10,12) have 

previously shown that gpl!6 has N- linked carbohydrate 
residues that account for approximately 10% of its apparent 
molecular weight. GP3 contains five potential 

glycosylation sites (Asn-X-Ser/Thr) (Figure la) . Two of 
20 these sites (573 and 809) are conserved in the overlapping 

regions of the other clones. It is unknown whether 0- 
linked glycosylation sites also exist in gp!16 . The 
threonine/serine-rich region may be a candidate for such 
glycosylation, as has been suggested for a serine-rich 
25 region in yeast gpllS and a threonine rich region in the 
promastigote surface antigen- 2 of Leishmania major (33,34) . 

Analysis of the hydrophilicity profile of the encoded 
proteins by the algorithm of Kyte and Doolittle (35) 
demonstrates a single hydrophobic region common to all f 
30 clones encompassing the last 15 amino acids at the carboxy^ 
terminus/ . There is no hydrophilic region compatible with 
an intracytoplasmic domain, nor is there a hydrophobic 
leader sequence. The position of the hydrophobic tail is 
consistent with a glycosyl phosphot idyl inositol .membrane* 
35 anchorage (36) for this surface protein, although currently 
there is no evidence to support such a linkage. 
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Searches of GenBank and PIR failed to identify any 
significant similarity to other known genes or proteins. 

The DNA sequences for the P. carinii major surface 
glycoprotein, such as shown in Figure lb, can be modified 
5 to provide sequences that are mutants, deletions, or 
substitutions thereof which encode a protein having at 
least 90% homology with the naturally occurring major- 
surface glycoprotein and possessing substantially the same 
properties as the P. carinii major surface glycoprotein^ 

10 The major surface glycoproteins of P.. carinii 

preferably comprises one of a homologous variant of said 
major surface glycoproteins of P. carinii having less than 
8 conservative amino acid changes, preferably less than 5 
conservative amino acid changes. In this context, 

15 "conservative amino aid changes" are substitutions of one 

amino acid by another amino acid wherein the charge and 
polarity of the two amino acids are not fundamentally 
different. Amino acids can be divided into the following 
four groups: (1) acidic amino acids, (2) neutral polar 

20 amino acids, (3) neutral non-polar amino acids and (4) 
basic amino acids. Conservative amino acid changes can be 
made by substituting one amino acid within a group by 
another amino acid within the same group. Representative 
amino acids within these groups include, but are not 

25 limited to, (1) acidic amino acids such as aspartic acid 

and glutamic acid, (2) neutral polar amino acids such as 
valine, isoleucine and leucine, (3) neutral nonpolar amino 
acids such as asparganine and glutamine and (4) basic amino 
acids such as lysine, arginine and histidine. 

30 In addition to the above mentioned substitutions, the 

major surface glycoproteins of P. carinii of the present 
invention may comprise the above mentioned specific amino 
acid sequences and additional sequences at the N-terminal 
end, C-terminal end or in the middle thereof. The "gene" 

35 or nucleotide sequence may have similar substitutions which 

allow it to code for the corresponding major surface 
glycoproteins of P. carinii . Individual base pair changes 
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or deletions or insertion of the DNA encoding for the major 
glycoproteins of P. carinii can be made by the methods of 
site-directed mutagenesis which are well known in the art. 
See Sambrook et al (39) . 
5 In processes for the synthesis of the major surface 

glycoproteins of P. carinii . DNA which encodes the major 
surface glycoproteins of P. carinii is ligated into a 
replicable (reproducible) vector, the vector is used to 
transform host cells, and the affector is recovered from 
10 the culture. The host cells for the above -described 

vectors include gram-negative bacteria such as E. coli, 
gram-positive bacteria, yeast and mammalian cells. 
Suitable replicable vectors will be selected depending upon 
the particular host cell chosen. • 

15 SIGNIFICANCE OF EXPERIMENTAL RESULTS 

Although P. carinii has been a major pathogen in human 
immunodeficiency virus of infected patients since the 
beginning of the AIDS epidemic, inability to culture the 
organism has made studies of immunopathogenesis very 

20 difficult. Experiments investigating host-organism 

interactions have recently focused on the major surface 
glycoprotein. Although the function of this protein is 
unknown, it is an abundant surface-exposed glycoprotein 
that has the potential to interact with multiple host cell- 

25 . associated or secreted proteins. As a surface protein, it 
is likely a primary target of the immune response. The 
present inventors have shown in the current experiments 
that multiple genes encode a family of related major 
surface glycoproteins, and that, based on chromosomal 

30 blots, multiple copies of these genes are present in the P^ 

carinii genome. Based on the presence of multiple genes, 
the present inventors believe that antigenic variability 
may play a role in immune evasion. Although antigenic 
variability is well-known in protozoal and bacterial 

35 pathogens (31,32), the variability of the major surface 
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glycoprotein is the first description of this phenomenon in 
■ the fungi. 

Previous experiments have shown that the major surface 
glycoprotein of P. carinii obtained from different species 
5 vary * in size and are antigenically distinct (8,13). 

However, no experiment has previously suggested variability 
in the protein moiety of the major surface glycoprotein in 
organisms obtained from a single species. An epitope of 
the major surface glycoprotein with a critical carbohydrate 
10 component that is conserved in P. carinii isolated from 

multiple species was identified by monoclonal antibody 
studies (13) , and administration of a monoclonal antibody 
to this epitope resulted in a decrease in the intensity of 
infection in two host species (17) . 

15 CLONING OF HUMAN ANTIGENS 

Based on the above, the corresponding human antigen 
can be prepared as follows: + 

tj 

Materials 

P, carinii organisms could be obtained from human HIV- 
20 infected patients and partially purified by Ficoll-Hypaque 
density gradient centrifugation as described (18). 

Human P- carinii libraries could be constructed in the 
same manner as the P. carinii cDNA library in X ZAP that 
has been described (4). A second library can be 
25 constructed in a similar fashion using oligo-div selected 

mRNA and subcloning into a modified X ZAP vector (19), 
YcDEll, which contained sequences necessary for 
Saccharomvces cerevisiae replication and expression. 
General Methods 

3 0 Several methods could be used to screen the human 

carinii library. 

1. The library could be screened with the already 
identified rat P. carinii surface antigen genes. This 
could identify the genes since antibody studies have 

35 demonstrated that although the rat and human P , CflriAi 4- 
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proteins are antigenically different, there is also cross- 
reactivity, and thus, there is likely to be conservation at 
the DNA level as well. Once one human P. carinii major 
surface glycoprotein gene is identified, that gene may be 
5 used to identify other members of the gene family. 

2 . The library may be screened using a conserved 
oligonucleotide whose sequence is based on the available 
rat P. carinii major surface glycoprotein genes. Since 
conserved regions are presumably functionally important, 
10 and since the rat and human p . carinii major surface 
glycoprotein are homologous, they would- feave conserved- the 
same regions that were conserved among ifat £. — carjftii 
genes.,. Low stringency conditions may be used to obtain 
hybridization even if the conservation is not absolute. 
15 3. Conserved oligonucleotides may be used based on 

sequences of the rat P. carinii major surface glycoprotein 
genes as primers for the polymerase chain reaction to be 
performed using human P. carinii DNA extracts aa template^ 
Conditions may be adjusted to low stringency if needed. 
20 Once a human P. carinii -specif ic piece of DNA is amplified, 
that DNA fragment may then be used to screen the library- to 
identify larger fragments or the ehtif e gene . 

4. Amino-acid sequence information from the purified 
human P. carinii major surface glycoprotein may be obtained 
25 by direct sequencing of proteolytic-enzyme generated 
fragments, in a manner similar to that done with the rat 
carinii major surface glycoprotein. This information may 
then be used to generate oligonucleotides that may be used 
either directly to screen the library, or as primers for 
30 PCR to amplify a fragment of the human P. carinii major 

surface glycoprotein gene, which may then be used for 
further screening . 

The identification of a multi-gene family of proteins 
is difficult because P. carinii cannot be cultured or 
35 cloned. The number of genes per genome encoding the major 

surface glycoprotein is difficult to estimate based on 
current data, but Southern blot experiments conducted by 
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the inventors using both conserved and specific 
oligonucleotides have led the inventors to believe that 
many similar genes exist in organisms obtained from a 
single host. The use of antibodies raised against peptides 
5 or oligonucleotides specific for individual genes will help 
determine if single organisms are expressing one or more 
genes, or if expression of specific genes is associated 
with specific stages of P. carinii. 

P. carinii has been one of the most devastating 

10 complications of the immunosuppression associated with 

human immunodeficiency virus infection. The use of 
chemoprophylaxis has lead to a marked decline in the 
incidence of P. carinii pneumonia, but the agents used for 
prophylaxis are associated with significant adverse 

15 reactions or a high failure rate (37). The recent 

demonstration that novel, potentially protective, immune 
responses to HIV can be induced by immunization of HIV- 
infected patients with rgpl60 (38) suggests that 
immunoprophylaxis may also be an effective alternative for 

20 controlling HIV-related opportunistic infections. The 

major surface glycoprotein of p. carinii can be used as a 
vaccine and as a diagnostic reagent. Additionally, the 
detailed study of this protein and its expression should 
lead to an understanding of its functional role in the 

25 pathogenesis of P. carinii pneumonia, and may lead to novel 
strategies designed to prevent or control JEL — carinii 
infection and its devastating consequences. 

In use as a vaccine, the P. carinii major surface 
glycoprotein antigen of this invention can be administered 

3 0 to mammals; e.g., human, in a variety of ways. Exemplary 

methods include parenteral (subcutaneous) administration 
given with a nontoxic adjuvant, such as an alum precipitate 
or peroral administration given after reduction or ablation 
of gastric activity; or in a pharmaceutical form that 

35 protects the antigen against inactivation by gastric juice 
(e.g., a protective capsule or microsphere). 
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The dose and dosage regimen will depend mainly upon 
whether the antigen is being administered for therapeutic 
or prophylactic purposes, the patient, and the patient's 
history. The total pharmaceutically effective amount of 
5 antigen administered per dose will typically be in the 

range of about 5/xg to 1280/ig per patient. 

For parental administration, the antigen will 
generally be formulated in a unit dosage injectable form 
(solution, suspension, emulsion) in association with a 

10 pharmaceutically acceptable parenteral vehicle. Such 

vehicles are inherently nontoxic and nontherapeutic . 
Examples of such vehicles include water, saline, Ringer's 
solution, dextrose solution, and 5% human serum albumin. 
Non-aqueous vehicles, such as fixed oils and ethyl oleate, 

15 may also be used. Liposomes may be used as vehicles. The 
vehicle may contain minor amounts of additives, such as 
substances which enhance isotonicity and chemical 
stability; e.g., buffers and preservatives. 

The~ recombinant major surface glycoprotein of this 

20 invention can provide a reagent ta be used in a variety of 

diagnostic assays to detect antibodies -*to P. carinii as 
well as being useful in developing additional reagents that 
can detect antigens in clinical specimens. The recombinant 
protein, can be used directly in assays to detect anti-P. 

25 carinii antibodies. Such assays would include?, for 

example, ELISA (enzyme -linked immunosorbent assays) , 
western blot (immunoblot) and immunoprecipitation assays. 
For antigen detection, antibodies, either polyclonal or 
monoclonal antibodies, can be generated to the recombinant 

30 proteins. These antibodies can then be used in antigen- 

capture assays using, for example, an ELISA format, and in 
immunof luorescent assays . 

The sequences of the genes can also be used to make 
primers for use in polymerase - chain, react ion s tudi es foy 

35 the diagnosis of P, carinii infection as- -well- as to make 

oligonucleotide probes that can be used directly n in 
diagnoflt,ie;:stssaysTfor^ detecting the BNA of B; carinii . 
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The invention being thus described, it will be obvious 
that the same may be varied in many ways. Such variations 
are not to be regarded as a departure from the spirit and 
scope of the invention, and all such modifications as would 
5 be obvious to one skilled in the art are intended to be 
included within the scope of the following claims. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: Kovacs, Joseph A. 

Angus, C. W. 
Powell, Francoise 
Edman, Jeffrey C. 

(ii) TITLE OF INVENTION: GENES THAT ENCODE A SURFACE PROTEIN OF 
P. CARNII 

(iii) NUMBER OF SEQUENCES: 19 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Birch, Stewart, Kolasch & Birch 

(B) STREET: 8110 Gatehouse Road 

(C) CITY: Falls Church 

(D) STATE: Virginia 

(E) COUNTRY: USA 

(F) ZIP: 22042 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS /MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 07/958,683 

(B) FILING DATE: 09-OCT-1992 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Murphy Jr., Gerald M. 

(B) REGISTRATION NUMBER: 28,977 

(C) REFERENCE/DOCKET NUMBER: 1173-368P 

(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: 703-205-8000 
<B) TELEFAX: 703-205-8050 



(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2110 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

' (iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Pneumocystis carinii 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
TGGTTATCCT CGTGGAGGTC ATTTGATGAA ATATATGAAG GAGGCGATAT AAGTTTTGAT 
CATGAAAAAC TCGAATTTAA CGAATATAAT CAAGTTTTAC AAATGCTTGA AAAGGCAAAA 



AATTGGGAAC CGGCTTTGTT GATAGAACCA AAGATTTTTC TAATAGACGA TATGAAGGGA 
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GAATTGAGTT 
ATGTTTTTAC 
AGGCGGGAAT 
CACAAGATGA 
■ ATGAGGATAA 
GTAAATTCAA 
GAGATAAAAA 
ATGAACTTCA 
AATGTATACT 
TGAGGGAAGG 
CGCTCGGAGG 
CAATGTTAAG 
GTAAAGCGCT 
ATGGCGAATT 
CGTGTACTGA 
ATAAAGCGCC 
TTGGGTTGGA 
GAGTGGATCT 
GCAGAGATGA 
CTAAGTATTT 
GCAAAAAAAA 
AAGGGTTGTC 
TTCCAACATT 
AAAATGCGTG 
ATAAAAAGGG 
TTGGTCATGT 
GTACAAAACT 
GTTATGGGCT 
ATCAGAGAGA 
TTAGTAGTGA 
ATTTTGACGT 
TGATTCAGGA 
GAAAGAATTC 



AAATCATTTG 
TGATGGTTAT 
GGCACGGCCG 
CATTAAGGAG 
ATGCAAACAA 
TGTTAATGAT 
ATGCAAAGAC 
AGAAGCATTG 
TTTAGAAGAC 
ATGTTACAAA 
GGATGCTAAA 
CCGAGAAAGT 
GAAAACAAAA 
AAAGGAAAAA 
AACAAAGTGT 
GGAATCTGAT 
TGATGTGTAT 
ACCAAGGAAG 
GAATGATGCA 
GAATACTGAT 
GCTAGATGTA 
TACGGAGTTT 
ATTTACGAAG 
TAAAGATAAT 
ACAAGACAGG 
AAGATTTTAT 
TAAAAAAGAT 
TTCAAATGAT 
TTTTCCATTT 
TTCATTATTG 
TACAGAAAGA 
AAACTGTACA 



GGGAGACGCC 
CCTCGTGGAG 
GTTAAGAGGC 
GAACACCTTT 
GAACTCAAGA 
AAAGTTAAAG 
CTGAAAGACA 
AAAGACATAA 
ACGGGTTATA 
TTGAAGCGTA 
GATGAAGCTA 
GACGAGCTGA 
TCAGAAGAAG 
TGTCATGAAA 
GATGAGGATA 
TTTAGTCCTG 
AAAAAGGCTG 
TCAGGTACAA 
GGGAAGAAAT 
TTGATGGAGT 
AAAGAAAGAT 
AAAGAAGATA 
GGAGAGTGTG 
GAGATTGGTG 
ATGTTGAATA 
AGCGATCCTA 
AAAAGATACC 
ATTTTTCTCC 
GAAAAGGATT 
AATTTAGAAA 
TTTAGAAAAG 
AAGGCATTGC 



29 

CAGGAGTCGA 
GTCATTTGAT 
AAGCAAAAGT 
TGGCTTTCAT 
AATATTGTGA 
AACTTTGTGG 
AAGTTGAAGA 
AAGATGAAAA 
GTGAAGATAT 
AAAAGGTGGC 
AATGTAAAGA 
TGTTTTTCTG 
TTTGCCTGCC 
GACTTGAGAA 
TGAAGCAATG 
TCAAGCCGAA 
AAAAAGAAGG 
AATTTCTGCA 
GCGGTAAAGC 
TATGCAAAGA 
GTACAAAACT 
AAAAATCACA 
CAGAACTTGA 
AAGCGTGTCA 
AGTTCTTTCA 
AAGATTGTAA 
TTTCAAAATG 
AATCCAAAGA 
GTCTTGAATT 
AGTGTATAAC 
TATTTTTAGA 
ATGAGAAATG 



CTATTTTAGG 
CGAGGATGAG 
AGTACAAGGA 
TGCGAAGAAG 
AGAGTTGAAG 
TGGTGGTGAT 
TGAATTAGAA 
TTGTGAAAAA 
TAAGAAGAAC 
AGAGGAGCTC 
AAAGATGAAA 
CCTTGATTCG 
TTTAAAAGAA 
ATGTCATTTT 
CAAGGAAAAA 
GGCGTCGTTG 
AATTATTATT 
AGATCTCTTG 
GTTAGGAAAA 
TGCTGATAAA 
CAAGTTAAAT 
TCTTTTATCG 
GTCGGAATGT 
AAATCTACGA 
AAAGGAATTG 
AAAATATGTG 
TCTTTATCCT 
GTTAAGTTCG 
GGGAGAGAAG 
ATTGAAAAGA 
AAAAAAGGAT 
TAATACTTTA 



AAAGGTGGGG 
TTGTCCGAAG 
GCACAAGGAG 
GAATATAGTA 
GAAGCAGATG 
GAAGCAAAAC 
AATTTTGATG 
CATGAAGAAA 
TGTGTCAAGT 
CTTTTGAGGG 
ACTGTTTGCC 
GATGGAACGT 
AAGCTTAAAG 
TACAAAGAAG 
GGATTCACAT 
TTGAGAAGTA 
GGAAAATCAG 
CTACTGTTGA 
TGTGAAACTT 
GAAAATAAAT 
CTTTATGTGA 

tggggacagc 
ttctatttag 
tcagcgtgct 
aagggaaagc 
gtagaaaact 

AAAGAACTAT 
CTTTTAGATG 
TGTGATCAAC 
CGCTGTGAAT 
GATTCGTTAA 
TATAAGAGGA 



24 
30 
36 
42 
48< 
54 ( 
60C 
66C 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2110 
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(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1454 base paxrs 

(B) TYPE:' nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(Vi) ^S^fpn-ocystis carinii 
txi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

^ ^acaac^ «— » c^cacg ca^gaaaaa ~» 
araM0OT g^aaaa* — « — — 

AGTCTTAAGG AGAAGTGTGT CAAGT.CAGG GAAGGATGTT 

_ CGAGGGGA*= CTAAAGAAGA TGQTAAATGT 
GTGGCAGAGG ATTGATGACT 
AAAGGAAAGA TGAATACTGT ^CCCAGTG »^ 
-TOCCTO ATCCGGATGG AACGTGTGGA GAGCTGAAAA CAAAATTG 

AAATGTCATT TTTACAAAGA ACCOTGTGGT ««»» «" ~" "V, 

^L— «-«~ ™— — 

• t c zzz 

r^«: — 

— ~~ GA6AATGATG CAGG^> 
GC6TTAGGAA AATGTGA.GC TTCTAAGTAT ««C» A~» ^ 
GATGGAAAGA AAAACGACAA ATGCAAAGAA TTACTAGATG TAAATGTAAA AGAAAGA 
AATTAAATCT ^TGTGAAA — » ^ * 

«— — ^ ~ ^ 

^..^ COSAATO^ CTATTTAGAA AATGCGTGTA AGGATAA*» 

„»A ATGCAAGAGC AGCG^A, — = ^ 
TTCTTTCAAA AGGAATTGAG GGGAAATCTT GGTCTTGTAA GATTTTAT 
^ AATCTGTGGT AGGAAAC^ ACAAAACTTA AAGAAGATM 
TTTATCCTAA AGAATTATGT TATGCGCTTT CAAATGATAT 
TCCAAAQAGT CAAAGGGATT — 

CTTGAATTGG TGGAGAAGTG TGATOAACTT AGTAGTGATT CATTATTGAA TT.AGAAAA 
CTTG „ m ,. rGTTA CAGAGGGATT TAGAAAAGTA 

TGTATAACAT TGAAAAGACG CTGTGAATAC TTTAAGGTTA CAG 

TTTTTAAAAA AAAA 
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(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2190 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Pneumocystis carnii 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 








TTCCCAAAAC 


TTTTATCGTG 


GGGACAGCTT CCAACCCTTT 


TTATAAAAGG AGAGTGTGCA 


6C 


GAACTTGAGT 


CGGAATGCTT 


CTATTTAGAA AATGCGTGTA 


CGAATAAGAT TGGTGAAGCA 


12C 


TGTCAAAATG 


TACGATCAGC 


GTGCTATAAA AAGGGACAAG 


ACAGGATGTT GAATACGTTG 


18C 


TTTCGAGAGG 


AGATGAAGGG 


AAAGCTTGCT AATATAAAAT 


ATTTTAATGA TACTGAAAGT 


24C 


TGCAAAAAAT 


CAGTGGCAAA 


AAAGTGTGCA GAACTTGATA 


AAAGATACCT TTCAAAATGT 


30C 


CTTTACCCTA 


AAAGACTATG 


TTATGTGCTT TCAGATGATA 


TTTTTCTTCA ATCAA&AGAG 


36C 


TTAAGTGTGC 


TTTTAGATGA 


TCAGAGAGAT TTTCCATTAG 


AAAAGGATTG 


TGTTG-&ATTG 


420 


GGAGAGAAGT 


GTGATGAACT 


TGGTAGTGAT TCATTATTGA 


ATTTAGAAAA 


GTGTATAACA 


48C 


TTGAAAAGAC 


GCTGTGAATA 


CTTTAAGGTT ACAGAAAGAT 


TTAGAAAAGT 


ATTTTrAGAA 


540 


AGAAAAGATC 


ATTCATTATA 


CGATGAGCAA AATTGTACGA 


AGGCGTTGCA 


TGAGA&ATGT 


60C 


GAAGCTTTAT 


TTAGGAAAAG 


GAGGAATCCA TTTGAGTTTT 


CATGTGCTTT 


GCAAG.AAGAA 


660 


ACATGTCAAC 


GTATGGTATA 


CCATACAACT CAAGATTGTA 


TTTATTTAAA 


AGACARCATC 


720 


AAAAATAAAA 


AAATTCTAGA 


ACAAATTGGA AAAGTAAAAC 


AGGATAAATC 


AAAAGAAGCA 


780 


GAAGTAGAAG 


AACTCTGCAC 


AACATGGGGT AAATATTGTC 


ACCAACTTAT 


GGAGAATTGT 


840 


CCAGATAAGT 


TGAAAAAAAA 


AAAAAAAAAA GACAATGACA 


ATAATCAAAA 


CTGCGAAGAA 


900 


CTCGAAAAAA 


AATGCACTGA 


TACCTTTAAA AAGTTGGAAT 


TGAAGGATGA 


GCTGACTCAT 


960 


CTGTTGAAAG 


GCAGCTTAAA 


GGATAAAGAA AAATGTAAAG 


TAACACTAGG 


ACAGCGTTGC 


1020 


CCTGAGTTGA 


AAAATAATGA 


TACATTCAAA ATTCTGCTTA 


CTAATTGTGA 


AGATTCCTTG 


1080 


GAAAATGTTT 


GCGCGGAATT 


AGTTAAAAAA GTACAGAAGA 


AATGTCCTAC 


TTTAAAAGAC 


1140 


GAACTGAATA 


AAGCGAAAGA 


TGAGTTGACA AAGATGAAGA 


CTGAGTACGA AAATGCTAAA 


120C 


AAGGCGGCAG 


AAGAATCTAC 


AAACAAAGCT AGCTTATTGC 


TATCAAAGTC 


TGGAAAAGCC 


1260 


GCAATGCCAA 


CTGCGCAGAA 


TGGCAGTGCT TCTGCACCAC 


CATCAGCACC 


AGCAGAATCA 


1320 


GGATCATCAC 


CAGCATCAGG 


GTCACCACCA GCATCAGAGC 


CATCAACTAA 


TGGAAAGGTG 


1380 


GACACGCCAG 


CTGGAGGATC 


AGGGACACAA GATAAAACAT 


CAGACGCATC 


AGGTCAAACG 


1440 


ACGAAGTATA 


CAAAACTTGG 


ACTCGTTAAA AGAGCATATG 


TAGCTGAAGG 


AGTATCAGAA 


1500 
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AATTGTATTT 

AGGATTGTGA 

AATCATTAAA 

CGACCACTAC 

CAACCAAGCC 

ATACATGGGT 

CGTCGACAGT 

CAAGCAAAGA 

TGAAAATAAG 

TGATGTAAAA 

ATATATAATA 



GAAGAGGTAA AAGCATTTGA TGCAACGACG GTAGCATTGG 
GAGGAATGCA ATGCTTTAGA ACTAGATTGG GGTTTTAAAG 
CCAGCTTGTA AAGAAATAGA AGAGTTATGC AAAGGAATAG 
CATCATACAG AGACGCAAAA AGAAATCTCA ACCACTACGA 
ACCACGACTA CTACCACGAC TACGACGACA ACTACTACTA 
GGAAAAGTAA CAGAAGAGTG TACAATGATA CAAACAACAG 
TCATTGCATA CGAGTACGAC AACGAGTACG TCGACAGTGA 
TCGATGCGCA AGTGCAAGCC TACCAAATGT ACCACTGATT 
GGAGGAAAAG AAGAAGAAGA AGTAAAACCG AATGATGGGA 
ATGATTAAAA TAATATTGTT GGGAGTGATT GTTATGGGGA 
AAATGTTAAT AGAATGAAAA TGTGCATATA TCCATTGTTT 
TGAATGAAAT GAAGTTTTAA TAATTTTAAG 
(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3521 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

. (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Pneumocystis camn 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
TAGACGATAT GAAGAGAGAA TTGAGTTAAA TCATTTGGGG AGACGCCCAG 
TTTTAGGAAA GGTGGGGATG TTTTTACTGA TGGTTATCCT CGTGGAGGTC 
GGATGAGTTG TCCGAAGAGG TGGCAATGGC ACGGCCGGTT AAGAGGCAAG 
AGCACAAGAT GAGATTGATG AGAAACACCT TTTGGCTTTC ATTGTGAAGG 
AGAAGAACAA AAATGCAAAG AAGAACTCGA GAAATATTGT AAAGAGTTGA 
TAAAAATCTA GAGAATGTGG ATGATAAAGT TAAAGGGCTT TGTGATGATA 
CGAAAAATGC AAAGACGTGA AAAAAAAAGT TGAAGATGAA TTAAAAGATT 
ACTTCAAAAA GTATTGAATA ATATAAAAGA TGAAAATTGC GAAAAATATG 
TATACTTTTA GAAGAGACGG ATTATGATGT TATTAAGGAT AACTGTATCG 
AGGATGTTAC AAATTGAAGC GTGAAAAGGT GGCAGAGGAG CTTCTTCTGA 
AGGGGATGCT AAAGAAGAAG CTAAATGTAA AGGAAAGATG AATACTGTTT 
GAGCCGAGAA AGCGACGAAT TGATGTCTTT TTGCCTTGAT TCTGCTAAAA 
TCTGAAAAAA AAATTGGGTA CTGTTTGCGA GCCTTTAAAA AAAGAGCTTA 



GGAATTGAkA 
GGAATCTA&A 
AGTTACGCCT 
TACGACCACT 
GGGAAGTGGA 
GACACGTACG 
GACATTGACG 
GACAGAAGAA 
AGTTCCTGAT 
TGAATGAAAA 
GAAATCTAAA 



1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2190 



GAGTCGACTA 
ATTTGATCGA 
CAGTACAAGG 
ACAAATATAA 
AGGAAGCAGA 
AAAAACGAGA 
TTGAAGAGGA 
AAGAAAAATG 
AGTTGAGGGA 
GGGCGCTCGG 
GCCCAGTGTT 
CATGTGGAGA 
AAGATAACGA 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
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ATTAGCGGAA AAGTGTCATG AAAGACTTGA 


GAAATGTCAT 


TTTTACGGAG 


AAGCGTGTGA 


84C 


TGATGCGAAA TGCAAGAAGT TTGAGGAGCA 


ATGCAAGGGA 


AAAAATATTA 


TATATAAAGC 


90C 


GCCAGAATCT GATCTTAGTC CTGTCAAGCC 


GAGGGCGTCC 


TTGTTGAGAA 


GTATTGGGTT 


96C 


GGATGATGTG TATAAAAACG CGGAAAAACA 


TGGGATTATT 


ATTGGAAAAT 


CAGGAGTGGA 


102C 


TCTACCAAGG AAGTCAGGTA CAAATTTCTG 


CAAGATCTCT 


TTGCTACTGT 


TGAGCAGAGA 


1080 


TGAGGATAAG AAGGAACCAG ATAAAAAGTG 


CACTAAAGCG 


TTAGAAAAAT 


GTGATGCCTC 


1140 


TAAGTATTTG AATACTGAAT TGGAAAAGTT 


ATGTAAAGAT 


GGAAACAAAA 


ACGAAAAATG 


1200 


CAAAAAAATA TTAGATGTAA AAGAAAGATG 


TACAAATCTC 


AAATTAAAAC 


TTTATCTGAA 


1260 


AGGATTGTCT ACGGAATATG ATGATCAAGA 


ATCAGATCCT 


TTATCGTGGG 


GACAGCTGCC 


1320 


AACTTTTTTT ATAAAAGGAG AGTGTGCAGA 


ACTTGAGTCG 


GAATGTTTCT 


ATTTAGAAAA 


1380 


GGCGTC3TAAA GATAATAATA TTGATAAAGC 


GTGCCAAAAT 


GCAAGAGCAG 


CGTGCTATAA 


1440 


AAAGGGACAA GACAGGA7GT TGAATAAGTT 


CTTTCAAAAG 


GAATTGAAGG 


GAAAGCTTGG 


1500 


TCATGTAAGA TTTTATAGCG ATCCTAAAGA 


TTGTAAAAAA 


TATGTGGTAG 


AAAACTGTAC 


1560 


AAAACTTGAT AAAAAATATC TTCCACGATG 


TCTTTATCCT 


AAAGAACTAT 


GTTATGGGCT 


1620 


TTCAAATGAT ATTTTTCTTC AATCCAAAGA 


GTTAAGTGCG 


CTTTTGGATG 


ATCAAAGGGA 


1680 


TTTTCCATTA AAAAAGGATT GTGTTGAGTT 


GAAGGAGAAG 


TGTGATGAAC 


TTAGTAGTGA 


1740 


TTCATTATTG AATTTAGAAA AGTGTATAAC 


ATTGAAAAGA 


CGTTGTGAAT 


ACTTTAGAGT 


1800 


TTCAGAGGGA TTTAGAAATG TATTTTTAGA 


AAAAAAGGAT 


GATTCGTTAA 


TGACTCAGGA 


I860 


TAACTGTACA AAGGCATTGC ATGAGAAATG 


CCATCAATTA 


TATAGGAGGA 


GAAAGJ&ATTC 


1920 


ATTTAGTGTT TCATGTGCTT TACCAGAAGA 


AACATGTAGT 


TATATGGTAT 


TCCATACAAG 


1980 


TCAAGATTGT AGTAGTTTAA AAGTCAACAT 


CAAGAATGAA 


AAAATTCTAG 


AAAAAATTGG 


2040 


AGAAGAAATT AAAAAAGCAA ATAAAAATGA 


AGCCTTGGTT 


GAAGAACTCT 


GCACA&CATG 


2100 


GGGCCGACAT TGTCACCAAC TTATGGAGAA 


TTGTCCGGAT 


GACTTGAAAA 


AAAAAGAGAA 


2160 


TGGCAATGGC AATGATCATA ACTGCGAAGC 


ACTCCAAGAA 


AAATGCAATA 


AAACCTTTGA 


2220 


AAAGTTGAAA TTAGAGGAGG AGCTGAGTCA 


TCTGTTGAAA 


GGCAGTTTAA 


AGGATGATAA 


2280 


ATGTAAAGAA GCATTAGGAA AGCGTTGCAC 


TGAGTTGGAA 


AAGAATGAAG 


CATTCAAAAC 


2340 


TCTGTATGGT AAATGTGATG ATAATACCAA 


GGAAAATGTT 


TGCAAAAAAT 


TAGTTGATAA 


2400 


TV f TV ^ TV TV TV TV f TV ^*TV *P^*^^f^^^*T*TV n^PTfT^ TV TV TV TV 






AAGAG TTG AC 


2460 


AAAGATGAAG AATGAGTACG ATGATCTCAA 


AAAGGCGGCA 


GAAAAATCTA 


CGGAGGCAGC 


2520 


TAAGTTATTG CTATCAAGAC CTAGACAAAC 


TGTAATGCCA 


AATGCGCAGA 


ATGGCAGTGA 


2580 


TTCTACACTA GTACCACCAC CACCACAAGC 


ACCAGCAGGG 


CCACCACCAC 


CAGGGTCACC 


2640 


ACCACCACCA CCATCACAAA ATGGAACGCC 


AGGCACACCA 


GGTGGAGAAA 


CAGGCGCATC 


2700 


AGGTGGAACA CCAGGCACAC CAGGCACACC 


AGGCACACCA 


GGCACACCAG 


GTGGAATGAT 


2760 


GAAGTATGCA AAACTTGGAC TCGTTAAAAG 


AACGTATGTA 


GATGGAGGTG 


TATCAGAAGT 


2820 
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AGAGGTCAAA GCATTTGATG CAACGACGAT AGCATTGGAA TTGTATTTGG AATTGAi\AGA 2880 

AGAATGTAAA GCTTTAGAAT TAGATTGCGG TTTTAAAGAG GATTGTCCAG ATACTAAACA 2940 

AGCTTGCGAA AATATAGACA CTTTATGTAA ACTGGAACCA TTAGAAATTA AGCCTCATCA 3000 

TACAGAGAAA ATAACAGAAA CAAAGACGGA AACGAAGACG GAAACAAAGA CGGAAACAAA 3060 

GACTGATGGC AAGGCTGATG AAAAGACCGT TGAGAAGACT GTTACAGAAA CCAAGTCAGT 3120 

AGGTGGAGGA AAAGTAACAG AAGAGTGTAC AATGATACAA ACAACAGATA CATGGGTGAC 3180 

GAGTACGTCA TTGCATACGA GTACGACAAC GAGTACGTCA ACGGTGACGT CGACAGTGAC 3240 

GTTGACTTCG ATGCGCAAGT GCAAGCCTAC CAAATGTACC ACCGATTCAA GCAAAGAGAC 3300 

ACAGAAAGAA GAAGATGATG AAGAAGTGAA ACCGAATGAG GGAATGAAAA TAAGAGTTCC 3360 

TGATATGATT AAAATAATGT TGTTGGGAGT GATTGTTATG GGGATGATGT AAATGAATGA 3420 

AAAAAATGTT AATAGATTGA AAATGTGCAT ATATCCATTG TTTATATATA ATAAAAATGT 3480 

AAATGAATGA AATGAAAAAA AAAAAAAAAA AAAAAAAAAA A 3521 
(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2058 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Pneumocystis carnii 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 



ATGTTGAGTA 


CGTTGTTTCG 


AAAGGAATCG 


AAGGGGGAGT 


CTGGTCATAA AAGATATTAT 


60 


AACCATCCTG 


AGGAATGCCA 


AAAATCTGTG 


GTAAAAGACT 


GTAAAAAACT 


TGAAAATAAA 


120 


GATAAGAGAT 


ACCTTCCAAA 


GTGTCTTTAT 


CCTAAAGAAC 


TATGTTATAT 


GCTTTCAGAT 


180 


GATATTTTCC 


TTCAATCCAA 


AGAGTTGGGA 


GCGCTTTTGG 


ATGATCAAAG 


GGATTTTCCA 


240 


TTAGAAAAGC 


ATTGTGTTGA 


ATTGAAGGAG 


AAGTGTGATG 


AACTTGAAAC 


TTATTCACAT 


300 


TCGAATTCGG 


AAAAGTGTAT 


AACATTGAGA 


AGGCGCTGTG 


AATACCTTAG 


AGTTTCAGAG 


360 


GAATTTAGAA 


AAGTATTTTT 


AAAAAGAAAA 


GATCATGCAT 


TATATAATGA 


GCAAAACTGT 


420 


ACGGAGGTGT 


TGCAAGAAAA 


ATGTAATACT 


TTATATAGGA 


GGAGAAAGAA 


TTCATTTAGT 


480 


GTTTCATGTG 


CTTTGCCAGG 


AGAAACATGT 


GAATATATGG 


TATACCGTAC 


AAAAGATGAA 


540 


TGTTTTTATT 


TAAGTGGCAA 


CATGGAGGAT 


GAAAAAATTG 


TAGARGAAAT 


TGGAAAGAAA 


600 


AAAGCAAATG 


AAACAGCACT 


CGAAGAACTC 


TGCACAACAT 


GGGGCCGACA TTGTCACCAA 


660 


CTTATGGAGA 


ATTGTCCGGA 


TGACTTGAAA 


AAAAAAGAGA 


ATGGCAATGA 


CAATGATCAT 


720 


AACTGTGAAG 


AACTCGATGA 


AAAATGCAGT 


GATACCTTTA 


AAAGGTTGAA ATTAGAGGAG 


780 
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GAGCTGACTC ATCTGTTGAA AGGCAGCTTA AAGGATAAGG ATGAATGTAA AAAAACATTA 840 

GAAAAGCGTT GCACTGAGTT GCAAAATAAT GAAACATTTA AAAATCTGCT TAGTTATTGT 900 

GGAGAGAATG ACAAGGGAAC TGTTTGCGAA AAATTAGTTG AAAAACTAAA AAAGAGATGT 960 

CCTACTTTAA AAGACGGACT GAATAAAGCG AAAGATGAGT TGACAAAGAT GAAAAJXAGAA 1020 

TACGATGCGC TTAAAAAGGC GGCAGAAGAA TCTACAAAGG AAGCTAGCTT ATTGCTATCA 1080 

AGACCTAGAC AAACTGTAAT GCCAAGTGCG CAGAATGGCA GTGCTTCAGA GCAAGTATTA 1140 

CAACCAGTAC AACCAGAATC AGGGTCATCA TCAGGGTCGC CATCATCACC ACCAGGGCCA 1200 

CCATCAGCAC CACCACAAAA TGGAACGCCA GCCACACCAG GTGGAGCACC AGGCA(ZACCA 1260 

AGCAGTGGAA CGACGGGCCC TGCAAAACTT GGACTCGTTA AAAGAGCATA TGTAGCTGAA 1320 

GGAGTATCAG AAGCAGAGGT CAAAGCATTT GATGCAACAA CGATAGCATT GGAGTTGTAT 1380 

TTGGAATTGA AAGAAGAATG TAAAGCTTTA GAATTAGATT GCGGTTTTAA AGAGGATTGT 144 0 

AAGGAAACTG AACCAGCTTG TAAAGAAATA GAAAAGTTAT GTAAACTGGA AGCATrAAAA 1500 

GTTGCGCCTC ATCATACAGA GACAATAACA AATAAGGTGA CGGAAACACA GACG&&AACA 1560 

AAGACCGTTG AGAAGGTCGA TGACAAGGCT GATGTGAAGA CCGTTGAGAA GACTGTTACG 1620 

GTAACCAAAC CAGGAAGTGG AGAAAAAGTA ACAGAAGAGT GTACAATGAT ACAAACAACA 1680 

GATACATGGG TGACAAGCAC GTCATTGCAT ACGAGTACGA CAACGAGTAC ATCGACGGTG 174 0 

ACGTCGACAG TGACGTTGAC CTCGATGCGC AAGTGCAAGC CTACCAAATG TACTACTGAT 1800 

TCAAGCAGAG AGACAGATAA AGGAGGAGAA GGAGAAGAAG ATGTAAAACC GAATGAGGGA 1860 

ATGAAAATAA GAGTTCCTGA TATGATTAAA ATAATGTTGT TGGGAGTGAT TGTTATGGGA 1920 

ATGATGTAAA ATGAATGAAA AAAATGTTAA TAGATTGAAA ATGTGCATAT ATCCATTGTT 1980 

TATATATAAT AGAAATCTAA ATGAATGAAA TGAAGTTTTA ATTTTAATAC ACCAAAAAAA 2040 

AAAAAAAAAA AAAAAAAA 2058 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2110 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Pneumocystis carnii 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

GCAGAACTTG TCTCGGAATG TTTTTATTTA GAAAAGGCGT GTAAAGATAA TAAAATTGAT 60 

CAAGCGTGTC AAAATGTACG AGCAGCGTGC TATAAAATGG GACAAAATAG GATGTTGAAT 120 

ATGCTCTTTC GAGAGGGGTT GAAGGAGAAT TCTGAACGTA TAAAATATTA TGATGAGAAT 180 



WO 94/09141 PCT/US93/0963S 

k "i ' 

36 

CCTCAAAAAT GTCAAGAATT TGTGGTAGGA AGCTGTACAA AACTTAAAAA ATATCTTCCA 24 0 

CAATGTCTTT ACCCTAAAGA ACTATGTTAT GCGGTTTCAG ATGATATTTT TCTTCAATCC 300 

AAAGAGTTGG GTGTGCTTTT GGATGATCAA AGAGATTTTC CATTAGAAGA GGATTGTCTT 360 

GAATTGAAGG AGAAGTGTGC TCAACTTGAA ACTTATTCAA ATTCGAATTC TCAAAAGTGT 420 

GCAACATTGA GAAGGCGCTG TAAATACTTA AGAGTTTCTG AGGGATTTAG AAATGTATTT 480 

TTAAAAAGAG AAGATGATTC GTTAAAGAAA GAAAACTGTA CGAAGGCATT GCAAGA&AAA 540 

TGTGATGCTT TATCTAGGAA AAGGAGGAAT CCATTTGGGT TTTCATGTGC TTTGCGAGAA 600 

GAAACATGTG AATATATGGT AGCCCGTACA AAAGACGAAT GTTTTTATTT AAAAGACAAC 660 

ATGGAGAATG AAGAAATTCT AAAAGAAATT GAAGAAAAAG CAAAAAAAGA TAATGCAAAT 720 

AGAAATGAAA CCTTGGTTGA AGAACTCTGC ACAACATGGG GCCGACATTG TCACCA&CTT 780 

GTGGGGAATT GTCCGGAGCA GTTGAAAAAA AAAAAAAAAA AAGATGATAA CAAAGATCAT 840 

AACTGTGACA AACTCGAAGA AAAATGCAGT GATACCTTTA AAAGGTTGAA ATTAGAGGAG 900 

GAGCTGACTC ATCTGTTGAA AGGAAGTTTA AAGAGTGAAG ATGAATGTAA AAAAACATTA 960 

GGAGAGCATT GCCCTGAGTT GCAAAAGAAT GATACATTCA AAACTCTGTA TGGTAAATGT 1020 

GAAGAGAATG AAAAGGGAAC TGTTTGCAAA AAATTAGTTA AAAAAGTACA AGAGAGATGT 1080 

CCTACTTTAA AAACCGATCT GGAGAAGGCG AAAAAAGAGT TGAAGGACAA GAAAGATGAA 1140 

TACGATAATG TCAAACAGGC AGCAAAAGAA TCTACGGAGA AAGCTAAGTT ATTACTATCG 1200 

AAGCCTCGAC AAACCGTAAC GCCAAATGCG CAGAATGGCA GTGCTTCTGG ACCAGTACCA 1260 

GCACCAGCAG CACCTCCAGC AGCACCAGAA GCACCAGCAC AGCCACCACC ACCAGCAGGG 1320 

CAACCAAGTG GTGAAACATC AAACGTACCA GGTAAAACGC CAAGCAAAGA AGCTGGAACA 1380 

CCAAACACAA CAGATGAAAC GACGAAGAAT CCAAGCCTTG GACTCGTTAA AAGAGCATAT 1440 

GTAGAAGGAG GTGTATCAGA AGCAGAGGTA AAAGCATTTG ATGCAACGAC AATAGCATTG 1500 

GAGTTGTATT TGGAATTGAA AGAGGAATGC AGCGCTTTAC AACTAGATTG CGGTTTTAGA 1560 

AAGGATTGTT CGAGTGTTGA AGGTGTTTGC AAAGAAATAG ACAAGTTATG TGAACTGGAA 1620 

CCATTAAAAG TTACGCCTCA TCATACAGAG ACAATAACAA ATAAGGTGAC GGAAACGAAG 1680 

ACGGAAACAA AGACAGAAAC AAAGACTGAT GACAAGGCTG ATGAGAAGAC CGGTACGAAA 1740 

ACTGTTACAG AAACCAAGAC AATAGGTGGA GGAAAAGTAA CAGAAGAGTG TACAATGGTA 1800 

CAAACAACAG ATACATGGAT AACACGTACG TCATTGCATA CGAGTACGAC AACGAGCACG 1860 

TCAACGGTGA CGTCGACAGT GACGTTGACC TCGATGCGCA AGTGCAAGCC TACCABATGT 1920 

ACCACTGATT CAACCAAAGA GACACAGAAA GAAGAAGATG ATGAAGAAGT GAAACCGAAT 1980 

GAGGGAATGA AAATAAGAGT TCCTGATATG ATTAAAATAA TGTTGTTGGG AGTGATTGTT 2040 

ACGGGGATGA TGTAAAATGA ATGAAAAAAA TGTTAATAGA TTGAAAATGT GCATATAAAA 2100 

AAAAAAAAAA 2110 
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(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2126 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Pneumocystis carnii 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



GCAGAACTTG 


AGTCGGAATG 


TTTTTATTTA 


AAAAAGGCGT 


GTAAAGATAA 


GGAGATTGAT 


60 


GAAGCATGTC 


AAAATGCACG 

•win » vwavu 


AGCAGCGTGC 


TATAAAGTGG 


GAAAAGATAG 


GATGTTGAGT 


120 


apgttgtttp 


GAAAGGAATC 


GAAGGGGGAG 


TPTGGTCATA 


AAAGATATTA 


TAACCATCCT 


180 


GAGGAATGPP 


21 A 21 21 A TPTGT 
nnrUvtl w 1 wl 


GGTAAAAGAP 


TGTAAAAAAP. 


TTGAAAATAA 


AGATAAGAGA 


240 




AG*TY2'FPT W P , PA 


TPPTAAAGAA 


PTATGVTATA 


TG P. TTTCAGA 


TGATATTTTC 


300 


PTTP A A TP P A 


AAGAGTTGGG 


21 G PG r "i"i"i"rf2 

/^.O w w w illlu 


GATGATCAAA 


GGGATTTTCC 


ATTAGJ^AAAG 


360 


P & TT P*T*G 


AATTGAAGGA 


GAAGTGTGAT 

UnnU x v? x VJ*A x 


GAACTTGAAA 


CTTATTCACA 


TTCGAATTCG 


420 


GAAAAGTGTA 


TAAPATTGAG 


AAGGPGPTGT 


GAATACCTTA 

VJArt x nv» ^- x x 


GAG TTTCAGA 


GGAATTTAGA 


480 


r\t\f\\9 Xnl ill 


TAAAAAGAAA 


AGATPATGPA 

MVSrlX wAl 


TTATATAATG 
x lAiniitniu 


AGCAAAACTG 


TACGGAGGTG 


540 


TTGCAAGAAA 


AATGTAATAC 


TTTATATAGG 


AGGAGAAAGA 


ATTCATTTAG 


TGTTTCATGT 


600 


GCTTTGCCAG 


GAGAAACATG 


TGAATATATG 


GTATACCGTA 


CAAAAGATGA 


ATGTTTTTAT 


660 


TTAAGTGGCA 


ACATGGAGGA 


TGAAAAAATT 


GTAGAAGAAA 


TTGGAAAGAA 


AAAAGCAAAT 


720 


GAAACAGCAC 


TCGAAGAACT 


CTGCACAACA 


TGGGGCCGAC 


ATTGTCACCA 


ACTTATGGAG 


780 


AATTGTCCAG 


ATAAGTTGAA 


AAAAGAAAGT 


GATAACAGAG 


ATCATAACTG 


TGACAAACTC 


840 


GAAGAAAAAT 


GCAGTGATAC 


CTTTAAAAAG 


TTGAAATTGA 


AGGAGGAGCT 


AACTCATCTG 


900 


TTGAAAGGAA 


GTTTAAATGA 


TAAAAAAAAA 


TGTACAGAAA 


CATTAGGAAA 


GAATTGCACT 


960 


GAGTTGCAAA 


AGAATGATAC 


ATTCAAAATT 


CTGCTTAGTG 


ATTGTAAAGA 


TTCCTTGGAA 


1020 


AATGTTTGCA 


CAAAATTAGT 


TGAAAAAGTA 


CAGAAGAGAT 


GTCCTGCTTT 


AAAAACCGAT 


1080 


CTAGAGGAAG 


CGAAAAAAGA 


GTTGAAGGTC 


AAGAAAGAAG 


AATATGATGC 


GCTCAAAAAG 


1140 


GCAGCAGAAG 


AATCCAGAAA 


TAAAGCTAGC 


TTATTGCTAT 


CAAGGTCTAA 


ACAAGCCGTA 


1200 


ACACCAAGTG 


GACAGAATGG 


CAGTGATTCT 


GTACCAGCAC 


AGGTACAGCC 


AGCACCAGCA 


1260 


GGGCCACCAT 


CAGCACCAGG 


GTCGCCATCA 


TCACCACCAT 


CACAAAATGG 


AACGCCAGGT 


1320 


GCACCAGATG 


GAACGACAGA 


CACAGCAGGT 


GGAACGACGA 


ATAATGCAAA 


ACTTGGACTC 


1380 


GTTAAAAGAG 


CGTATGTAGA 


TGAAGGTGTA 


TCAGAAGCAG 


AGGTAAAAGC 


ATTTAATGCA 


1440 


ACGACAATAG 


CATTGGAATT 


GTATTTGGAA 


TTGAAAGAGG 


AATGCAGCGC 


TTTACi\ACTA 


1500 
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GATTGCGGTT TTAAAGAGGA TTGTCCAGAT ACTAAACAAG CTTGTAAAGA AATAGAAGAG 1560 

TTATGTAAAC TGGAAGCATT AAAAGTTGCG CCTCATCATA CAGAGACAAT AACAGAAACG 1620 

AAGACAGAAA CGAAGACGGA AACAAAGATG GAAACAAAGA CTGATGACAA GGCTGATGAG 1680 

AAGACCGGTA CGAAAACTGT TACAGAAACC AAGACAATAG GTGGAGGAAA AGTAACAGAA 1740 

GAGTGTACAT TAGTCAAGAC AACAGATACA TGGGTGACGA GTACGTCATT GCATACC5AGT 1800 

ACGACAACGA GTACGTCAAC GGTGACGTCT ACAGTGACGT TGACCTCGAT GCGCAAGTGC 1860 

AAGCCTACCA AATGTACCAC CGATTCAACC AAAGAGACAC AGAAAGAAGA AGATGAAGAA 1920 

GTAAAACCGA ATAATGGGAT GAAAATAAGA GTTCCTGATA TGATTAAAAT AATGTTGTTG . 1980 

GGAGTGATTG TTATGGGGAT GATGTAAAAT GAATGAAAAA AATGTTAATA GATTGAAATT 2040 

GTGCATATAT CCATTGTTTA TATATAATAG AAATCTAAAT GAATGAATGA ATTAAAAAAT 2100 

AAAGTTTTAA AAAAAAAAAA AAAAAA 2126 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3521 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Pneumocystis carnii 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 146.. 3409 

(D) OTHER INFORMATION: /product = "gp3» 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

TAGACGATAT GAAGAGAGAA TTGAGTTAAA TCATTTGGGG AGACGCCCAG GAGTCGACTA 60 

TTTTAGGAAA GGTGGGGATG TTTTTACTGA TGGTTATCCT CGTGGAGGTC ATTTGATCGA 120 

GGATGAGTTG TCCGAAGAGG TGGCA ATG GCA CGG CCG GTT AAG AGG CAA GCA 172 

Met Ala Arg Pro Val Lys Arg Gin Ala 
1 5 

GTA CAA GGA GCA CAA GAT GAG ATT GAT GAG AAA CAC CTT TTG GCT TTC 220 
Val Gin Gly Ala Gin Asp Glu He Asp Glu Lys His Leu Leu Ala Phe 
10 15 20 25 

ATT GTG AAG GAC AAA TAT AAA GAA GAA CAA AAA TGC AAA GAA GAA CTC 268 
He Val Lys Asp Lys Tyr Lys Glu Glu Gin Lys Cys Lys Glu Glu Leu 
30 35 40 

GAG AAA TAT TGT AAA GAG TTG AAG GAA GCA GAT AAA AAT CTA GAG AAT 316 
Glu Lys Tyr Cys Lys Glu Leu Lys Glu Ala Asp Lys Asn Leu Glu Aisn 
45 50 55 

GTG GAT GAT AAA GTT AAA GGG CTT TGT GAT GAT AAA AAA CGA GAC GAA 364 
Val Asp Asp Lys Val Lys Gly Leu Cys Asp Asp Lys Lys Arg Asp Glu 
60 65 .70 
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AAA TGC AAA GAC GTG AAA AAA AAA GTT GAA GAT GAA TTA AAA GAT TTT 412 
Lys Cys Lys Asp Val Lys Lys Lys Val Glu Asp Glu Leu Lys Asp Phe 
7.5 80 85 

GAA GAG GAA CTT CAA AAA GTA TTG AAT AAT ATA AAA GAT GAA AAT TGC 460 
Glu Glu Glu Leu Gin Lys Val Leu Asn Asn lie Lys Asp Glu Asn Cys 
90 95 100 105 . 

GAA AAA TAT GAA GAA AAA TGT ATA CTT TTA GAA GAG ACG GAT TAT GAT 508 
Glu Lys Tyr Glu Glu Lys Cys lie Leu Leu Glu Glu Thr Asp Tyr Asp 
110 115 120 

GTT ATT AAG GAT AAC TGT ATC GAG TTG AGG GAA GGA TGT TAC AAA TTG 556 
Val lie Lys Asp Asn Cys lie Glu Leu Arg Glu Gly Cys Tyr Lys Leu 
125 130 135 

AAG CGT GAA AAG GTG GCA GAG GAG CTT CTT CTG AGG GCG CTC GGA GGG 604 
Lys Arg Glu Lys Val Ala Glu Glu Leu Leu Leu Arg Ala Leu Gly Gly 
140 145 150 

GAT GCT AAA GAA GAA GCT AAA TGT AAA GGA AAG ATG AAT ACT GTT TGC 652 
Asp Ala Lys Glu Glu Ala Lys Cys Lys Gly Lys Met Asn Thr Val Cys 
155 160 165 

CCA GTG TTG AGC CGA GAA AGC GAC GAA TTG ATG TCT TTT TGC CTT GAT 700 
Pro Val Leu Ser Arg Glu Ser Asp Glu Leu Met Ser Phe Cys Leu Asp 
170 175 180 165 

TCT GCT AAA ACA TGT GGA GAT CTG AAA AAA AAA TTG GGT ACT GTT TGC 748 
Ser Ala Lys Thr Cys Gly Asp Leu Lys Lys Lys Leu Gly Thr Val Cys 
190 195 200 

GAG CCT TTA AAA AAA GAG CTT AAA GAT AAC GAA TTA GCG GAA AAG TGT 796 
Glu Pro Leu Lys Lys Glu Leu Lys Asp Asn Glu Leu Ala Glu Lys Cys 
205 210 215 

CAT GAA AGA CTT GAG AAA TGT CAT TTT TAC GGA GAA GCG TGT GAT GAT 844 
His Glu Arg Leu Glu Lys Cys His Phe Tyr Gly Glu Ala Cys Asp Asp 
220 225 230 

GCG AAA TGC AAG AAG TTT GAG GAG CAA TGC AAG GGA AAA AAT ATT ATA 892 
Ala Lys Cys Lys Lys Phe Glu Glu Gin Cys Lys. Gly Lys Asn lie lie 
235 240 245 

TAT AAA GCG CCA GAA TCT GAT CTT AGT CCT GTC AAG CCG AGG GCG TCC 940 
Tyr Lys Ala Pro Glu Ser Asp Leu Ser Pro Val Lys Pro Arg Ala Ser 
250 255 260 265 

TTG TTG AGA AGT ATT GGG TTG GAT GAT GTG TAT AAA AAC GCG GAA AAA 988 
Leu Leu Arg Ser He Gly Leu Asp Asp Val Tyr Lys Asn Ala Glu Lys 
270 275 280 

CAT GGG ATT ATT ATT GGA AAA TCA GGA GTG GAT CTA CCA AGG AAG TCA 1036 
His Gly He He He Gly Lys Ser Gly Val Asp Leu Pro Arg Lys Ser 
285 290 295 

GGT ACA AAT TTC TGC AAG ATC TCT TTG CTA CTG TTG AGC AGA GAT GAG 1084 
Gly Thr Asn Phe Cys Lys He Ser Leu Leu Leu Leu Ser Arg Asp Glu 
300 305 3i0 

GAT AAG AAG GAA CCA GAT AAA AAG TGC ACT AAA GCG TTA GAA AAA TGT 1132 
Asp Lys Lys Glu Pro Asp Lys Lys Cys Thr Lys Ala Leu Glu Lys Cys 
315 320 325 

GAT GCC TCT AAG TAT TTG AAT ACT GAA TTG GAA AAG TTA TGT AAA GAT 1180 
Asp Ala Ser Lys Tyr Leu Asn Thr Glu Leu Glu Lys Leu Cys Lys Asp 
330 335 340 345 
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GGA AAC AAA AAC GAA AAA TGC AAA AAA ATA TTA GAT GTA AAA GAA AGA 1228 
Gly Asn Lys Asn Glu Lys Cys Lys Lys lie Leu Asp Val Lys Glu Arg 
350 355 360 

TGT ACA AAT CTC AAA TTA AAA CTT TAT CTG AAA GGA TTG TCT ACG GAA 1276 
Cys Thr Asn Leu Lys Leu Lys Leu Tyr Leu Lys Gly . Leu Ser Thr Glu 
365 370 375 

TAT GAT GAT CAA GAA TCA GAT CCT TTA TCG TGG GGA CAG CTG CCA ACT 1324 
Tyr Asp Asp Gin Glu Ser Asp Pro Leu Ser Trp Gly Gin Leu Pro Thr 
380 385 390 

TTT TTT ATA AAA GGA GAG TGT GCA GAA' CTT GAG TCG GAA TGT TTC TAT 1372 
Phe Phe lie Lys Gly Glu Cys Ala Glu Leu Glu Ser Glu Cys Phe Tyr 
395 400 405 

TTA GAA AAG GCG TGT AAA GAT AAT AAT ATT GAT AAA GCG TGC CAA AAT 1420 
Leu Glu Lys Ala Cys Lys Asp Asn Asn lie Asp Lys Ala Cys Gin Asn 
410 415 420 425 

GCA AGA GCA GCG TGC TAT AAA AAG GGA CAA GAC AGG ATG TTG AAT AAG 1468 
Ala Arg Ala Ala Cys Tyr Lys Lys Gly Gin Asp Arg Met Leu Asn Lys 
430 435 440 

TTC TTT CAA AAG GAA TTG AAG GGA AAG CTT GGT CAT GTA AGA TTT TAT 1516 
Phe Phe Gin Lys Glu Leu Lys Gly Lys Leu Gly His Val Arg Phe Tyr 
445 450 ' 455 

AGC GAT CCT AAA GAT TGT AAA AAA TAT GTG GTA GAA AAC TGT ACA AAA 1564 
Ser Asp Pro Lys Asp Cys Lys Lys Tyr Val Val Glu Asn Cys Thr Lys 
460 465 470 

CTT GAT AAA AAA TAT CTT CCA CGA TGT CTT TAT CCT AAA GAA CTA TGT 1612 
Leu Asp Lys Lys Tyr Leu Pro Arg Cys Leu Tyr Pro Lys Glu Leu Cys 
475 480 485 

TAT GGG CTT TCA AAT GAT ATT TTT CTT CAA TCC AAA GAG TTA AGT GCG 1660 
Tyr Gly Leu Ser Asn Asp lie Phe Leu Gin Ser Lys Glu Leu Ser Ala 
490 495 500 505 

CTT TTG GAT GAT CAA AGG GAT TTT CCA TTA AAA AAG GAT TGT GTT GAG 1708 
Leu Leu Asp Asp Gin Arg Asp Phe Pro Leu Lys Lys Asp Cys Val Glu 
510 515 520 

TTG AAG GAG AAG TGT GAT GAA CTT AGT AGT GAT TCA TTA TTG AAT TTA 1756 
Leu Lys Glu Lys Cys Asp Glu Leu Ser Ser Asp Ser Leu Leu Asn Leu 
525 530 • 535 

GAA AAG TGT ATA ACA TTG AAA AGA CGT TGT GAA TAC TTT AGA GTT TCA 1804 
Glu Lys Cys lie Thr Leu Lys Arg Arg Cys Glu Tyr Phe Arg Val Ser 
540 545 550 

GAG GGA TTT AGA AAT GTA TTT TTA GAA AAA AAG GAT GAT TCG TTA ATG 1852 
Glu Gly Phe Arg Asn Val Phe Leu Glu Lys Lys Asp Asp Ser Leu Met 
555 560 565 

ACT CAG GAT AAC TGT ACA AAG GCA TTG CAT GAG AAA TGC CAT CAA TTA 1900 
Thr Gin Asp Asn Cys Thr Lys Ala Leu His Glu Lys Cys His Gin Leu 
570 575 580 585 

TAT AGG AGG AGA AAG AAT TCA TTT AGT GTT TCA TGT GCT TTA CCA GAA 1948 
Tyr Arg Arg Arg Lys Asn Ser Phe Ser Val Ser Cys Ala Leu Pro Glu 
590 595 600 

GAA ACA TGT AGT TAT ATG GTA TTC CAT ACA AGT CAA GAT TGT AGT AGT 1996 
Glu Thr Cys Ser Tyr Met Val Phe His Thr Ser Gin Asp Cys Ser Ser 
605 610 615 
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TTA AAA GTC AAC ATC AAG AAT GAA AAA ATT CTA GAA AAA ATT GGA GAA 2044 
Leu Lys Val Asn lie Lys Asn Glu Lys He Leu Glu Lys He Gly Glu 
620 625 630 

GAA ATT AAA AAA GCA AAT AAA AAT GAA GCC TTG GTT GAA GAA CTC TGC 2092 
Glu He Lys Lys Ala Asn Lys Asn Glu Ala Leu Val Glu Glu Leu Cys 
635. 640 . 645 

ACA ACA TGG GGC CGA CAT TGT CAC CAA CTT ATG GAG AAT TGT CCG GAT 2140 
Thr Thr Trp Gly Arg His Cys His Gin Leu Met Glu Asn Cys Pro Asp 
650 655 660 665 

GAC TTG AAA AAA AAA GAG AAT GGC AAT GGC AAT GAT CAT AAC TGC GAA 2188 
Asp Leu Lys Lys Lys Glu Asn Gly Asn Gly Asn Asp His Asn Cys Glu 
670 675 680 

GCA CTC CAA GAA AAA TGC AAT AAA ACC TTT GAA AAG TTG AAA TTA GAG 2236 
Ala Leu Gin Glu Lys Cys Asn Lys Thr Phe Glu Lys Leu Lys Leu Glu 
685 690 695 

GAG GAG CTG AGT CAT CTG TTG AAA GGC AGT TTA AAG GAT GAT AAA TGT 2284 
Glu Glu Leu Ser His Leu Leu Lys Gly Ser Leu Lys Asp Asp Lys Cys 
700 705 710 

AAA GAA GCA TTA GGA AAG CGT TGC ACT GAG TTG GAA AAG AAT GAA GCA 2332 
Lys Glu Ala Leu Gly Lys Arg Cys Thr Glu Leu Glu Lys Asn Glu Ala 
715 720 725 

TTC AAA ACT CTG TAT GGT AAA TGT GAT GAT AAT ACC AAG GAA AAT GTT 2380 
Phe Lys Thr Leu Tyr Gly Lys Cys Asp Asp Asn Thr Lys Glu Asn Val 
730 735 740 745 

TGC AAA AAA TTA GTT GAT AAA GTA AAA AAG AGA TGC CCT ACT TTA AAA 2428 
Cys Lys Lys Leu Val Asp Lys Val Lys Lys Arg Cys Pro Thr Leu Lys 
750 755 760 

GAC GAA CTG GAG AAT GCG AAA AAA GAG TTG ACA AAG ATG AAG AAT GAG 2476 
Asp Glu Leu Glu Asn Ala Lys Lys Glu Leu Thr Lys Met Lys Asn Glu 
765 770 775 

TAC GAf GAT CTC AAA AAG GCG GCA GAA AAA TCT ACG GAG GCA GCT AAG 2524 
Tyr Asp Asp Leu Lys Lys Ala Ala Glu Lys Ser Thr Glu Ala Ala Lys 
780 785 790 

TTA TTG CTA TCA AGA CCT AGA CAA ACT GTA ATG CCA AAT GCG CAG AAT 2572 
Leu Leu Leu Ser Arg Pro Arg Gin Thr Val Met Pro Asn Ala Gin Asn 
795 800 805 

GGC AGT GAT TCT ACA CTA GTA CCA CCA CCA CCA CAA GCA CCA GCA GGG 2620 
Gly Ser Asp Ser Thr Leu Val Pro Pro Pro Pro Gin Ala Pro Ala Gly 
810 815 820 825 

CCA CCA CCA CCA GGG TCA CCA CCA CCA CCA CCA TCA CAA AAT GGA ACG 2668 
Pro Pro Pro Pro Gly Ser Pro Pro Pro Pro Pro Ser Gin Asn Gly Thr 
830 835 840 

CCA GGC ACA CCA GGT GGA GAA ACA GGC GCA TCA GGT GGA ACA CCA GGC 2716 
Pro Gly Thr Pro Gly Gly Glu Thr Gly Ala Ser Gly Gly Thr Pro Gly 
845 850 855 

ACA CCA GGC ACA CCA GGC ACA CCA GGC ACA CCA GGT GGA ATG ATG AAG 2764 
Thr Pro Gly Thr Pro Gly Thr Pro Gly Thr Pro Gly Gly Met Met Lys 
860 865 870 

TAT GCA AAA CTT GGA CTC GTT AAA AGA ACG TAT GTA GAT GGA GGT GTA 2812 
Tyr Ala Lys Leu Gly Leu Val Lys Arg Thr Tyr Val Asp Gly Gly Val 
875 880 885 
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TCA GAA GTA GAG GTC AAA GGA TTT GAT GCA ACG ACG ATA GCA TTG GAA 2860 
Ser Glu Val Glu Val Lys Ala Phe Asp Ala Thr Thr He Ala Leu Glu 
890 895 900 905 

TTG TAT TTG GAA TTG AAA GAA GAA TGT AAA GCT TTA GAA TTA GAT TGC 2908 
Leu Tyr Leu Glu Leu Lys Glu Glu Cys Lys. Ala Leu Glu Leu Asp Cys 
910 915 920 

GGT TTT AAA GAG GAT TGT CCA GAT ACT AAA CAA GCT TGC GAA AAT ATA 2956 
Gly Phe Lys Glu Asp Cys Pro Asp Thr Lys Gin Ala Cys Glu Asn He 
925 930 935 

GAC ACT TTA TGT AAA CTG GAA CCA TTA GAA ATT AAG CCT CAT CAT ACA 3004 
Asp Thr Leu Cys Lys Leu Glu Pro Leu Glu He Lys Pro His His Thr 
940 945 950 

GAG AAA ATA ACA GAA ACA AAG ACG GAA ACG AAG ACG GAA ACA AAG ACG 3052 
Glu Lys He Thr Glu Thr Lys Thr Glu Thr Lys Thr Glu Thr Lys Thr 
955 960 965 

GAA ACA AAG ACT GAT GGC AAG GCT GAT GAA AAG ACC GTT GAG AAG ACT 3100 
Glu Thr Lys Thr Asp Gly Lys Ala Asp Glu Lys Thr Val Glu Lys Thr 
970 975 980 985 

GTT ACA GAA ACC AAG TCA GTA GGT GGA GGA AAA GTA ACA GAA GAG TGT 3148 
Val Thr Glu Thr Lys Ser Val Gly Gly Gly Lys Val Thr Glu Glu Cys 
990 995 1000 

ACA ATG ATA CAA ACA ACA GAT ACA TGG GTG ACG AGT ACG TCA TTG CAT 3196 
Thr Met He Gin Thr Thr Asp Thr Trp Val Thr Ser Thr Ser Leu His 
1005 1010 ' 1015 

ACG AGT ACG ACA ACG AGT ACG TCA ACG GTG ACG TCG ACA GTG ACG TTG 3244 
Thr Ser Thr Thr Thr Ser Thr Ser Thr Val Thr Ser Thr Val Thr Leu 
1020 1025 1030 

ACT TCG ATG CGC AAG TGC AAG CCT ACC AAA TGT ACC ACC GAT TCA AGC 32 92 

Thr Ser Met Arg Lys Cys Lys Pro Thr Lys Cys Thr Thr Asp Ser Ser 
1035 1040 1045 

AAA GAG ACA CAG AAA GAA GAA GAT GAT GAA GAA GTG AAA CCG AAT GAG 3340 
Lys Glu Thr Gin Lys Glu Glu Asp Asp Glu Glu Val Lys Pro Asn Glu 
1050 1055 1060 1065 

GGA ATG AAA ATA AGA GTT CCT GAT ATG ATT AAA ATA ATG TTG TTG GGA 3388 
Gly Met Lys He Arg Val Pro Asp Met He Lys He Met Leu Leu Gly 
1070 1075 1080 

GTG ATT GTT ATG GGG ATG ATG TAAATGAATG AAAAAAATGT TAATAGATTG 3439 
Val He Val Met Gly Met Met 
1085 

AAAATGTGCA TATATCCATT GTTTATATAT AATAAAAATG TTAAAGAATG AAATGAAAAA 34 99 
AAAAAAAAAA AAAAAAAAAA AA 3521 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

{A) LENGTH: 1088 amino acids 
(B) TYPE : amino acid 
<D) TOPOLOGY: linear 



(ii) 
(xi) 



MOLECULE TYPE: protein 

SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
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Met Ala Arg Pro Val Lys Arg Gin Ala Val Gin Gly Ala Gin Asp Glu 
15 10 15 

He Asp Glu Lys His Leu Leu Ala Phe He Val Lys Asp Lys Tyr Lys 
20 25 30 

Glu Glu Gin Lys Cys Lys Glu Glu Leu Glu Lys Tyr Cys Lys Glu Leu 
35 40 45 

Lys Glu Ala Asp Lys Asn Leu Glu Asn Val Asp Asp Lys Val Lys Gly 
50 55 60 

Leu Cys Asp Asp Lys Lys Arg Asp Glu Lys Cys Lys Asp Val Lys Lys 
65 70 75 80 

Lys Val Glu Asp Glu Leu Lys Asp Phe Glu Glu Glu Leu Gin Lys Val 
85 90 95 

Leu Asn Asn He Lys Asp Glu Asn Cys Glu Lys Tyr Glu Glu Lys Cys 
100 105 HO 

He Leu Leu Glu Glu Thr Asp Tyr Asp Val He Lys Asp Asn Cys He 
115 120 125 

Glu Leu Arg Glu Gly Cys Tyr Lys Leu Lys Arg Glu Lys Val Ala Glu 
130 135 140 

Glu Leu Leu Leu Arg Ala Leu Gly Gly Asp Ala Lys Glu Glu Ala Lys 
145 150 , 155 160 

Cys Lys Gly Lys Met Asn Thr Val Cys Pro Val Leu Ser Arg Glu Ser 
165 170 175 

Asp Glu Leu Met Ser Phe Cys Leu Asp Ser Ala Lys Thr Cys Gly Asp 
180 185 190 

Leu Lys Lys Lys Leu Gly Thr Val Cys Glu Pro Leu Lys Lys Glu Leu 
195 200 205 

Lys Asp Asn Glu Leu Ala Glu Lys Cys His Glu Arg Leu Glu Lys Cys 
210 215 220 

His Phe Tyr Gly Glu Ala Cys Asp Asp Ala Lys Cys Lys Lys Phe Glu 
225 230 235 240 

Glu Gin Cys Lys Gly Lys Asn He lie Tyr Lys Ala Pro Glu Ser Asp 
245 250 255 

Leu Ser Pro Val Lys Pro Arg Ala Ser Leu Leu Arg Ser He Gly Leu 
260 265 270 

Asp Asp Val Tyr Lys Asn Ala Glu Lys His Gly He He He Gly Lys 
275 280 285 

Ser Gly Val Asp Leu Pro Arg Lys Ser Gly Thr Asn Phe Cys Lys He 
290 295 300 

Ser Leu Leu Leu Leu Ser Arg Asp Glu Asp Lys Lys Glu Pro Asp Lys 
305 310 . 315 320 

Lys Cys Thr Lys Ala . Leu Glu Lys Cys Asp Ala Ser Lys Tyr Leu Asn 
325 330 335 

Thr Glu Leu Glu Lys Leu Cys Lys Asp Gly Asn Lys Asn Glu Lys Cys 
340 345 350 
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Lys Lys lie Leu Asp Val Lys Glu Arg Cys Thr Asn Leu Lys Leu Lys 
355 360 365 

Leu Tyr Leu Lys Gly Leu Ser Thr Glu Tyr Asp Asp Gin Glu Ser Asp 
370 375 380 

Pro Leu Ser Trp Gly Gin Leu Pro Thr Phe Phe lie Lys Gly Glu Cys 
385 390 395 400 

Ala Glu Leu Glu Ser Glu Cys Phe Tyr Leu Glu Lys Ala Cys Lys Asp 
405 410 415 

Asn Asn lie Asp Lys Ala Cys Gin Asn Ala Arg Ala Ala Cys Tyr Lys 
420 425 430 

Lys Gly Gin Asp Arg Met Leu Asn Lys Phe Phe Gin Lys Glu Leu Lys 
435 440 445 

Gly Lys Leu Gly His Val Arg Phe Tyr Ser Asp Pro Lys Asp Cys Lys 
450 455 460 

Lys Tyr Val Val Glu Asn Cys Thr Lys Leu Asp Lys Lys Tyr Leu Pro 
465 470 475 * 480 

Arg Cys Leu Tyr Pro Lys Glu Leu Cys Tyr Gly Leu Ser Asn Asp lie 
485 490 495 

Phe Leu Gin Ser Lys Glu Leu Ser Ala Leu Leu Asp Asp Gin Arg Asp 
500 505 510 

Phe Pro Leu Lys Lys Asp Cys Val Glu Leu Lys Glu Lys Cys Asp Glu 
515 520 525 

Leu Ser Ser Asp Ser Leu Leu Asn Leu Glu Lys Cys He Thr Leu Lys 
530 535 540 

Arg Arg Cys Glu Tyr Phe Arg Val Ser Glu Gly Phe Arg Asn Val Phe 
545 550 555 560 

Leu Glu Lys Lys Asp Asp Ser Leu Met Thr Gin Asp Asn Cys Thr Lys 
565 570 575 

Ala Leu His Glu Lys Cys His Gin Leu Tyr Arg Arg Arg Lys Asn Ser 
580 585 590 

Phe Ser Val Ser Cys Ala Leu Pro Glu Glu Thr Cys Ser Tyr Met Val 
595 600 605 

Phe His Thr Ser Gin Asp Cys Ser Ser Leu Lys Val Asn He Lys Asn 
610 615 620 

Glu Lys He Leu Glu Lys He Gly Glu Glu He Lys Lys Ala Asn Lys 
625 630 635 640 

Asn Glu Ala Leu Val Glu Glu Leu Cys Thr Thr Trp Gly Arg His Cys 
645 650 655 

His Gin Leu Met Glu Asn Cys Pro Asp Asp Leu Lys Lys Lys Glu Asn 
660 665 670 

Gly Asn Gly Asn Asp His Asn Cys Glu Ala Leu Gin Glu Lys Cys Asn 
675 680 685 

Lys Thr Phe Glu Lys Leu Lys Leu Glu Glu Glu Leu Ser His Leu Leu 
690 695 700 
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Lys Gly Ser Leu Lys Asp Asp Lys Cys Lys Glu Ala Leu Gly Lys Arg 
705 710 715 720 

Cys Thr Glu Leu Glu Lys Asn Glu Ala Phe Lys Thr Leu Tyr Gly Lys 
725 730 735 

Cys Asp Asp Asn Thr Lys Glu Asn Val Cys Lys Lys Leu Val Asp Lys 
740 745 750 

Val Lys Lys Arg Cys Pro Thr Leu Lys Asp Glu Leu Glu Asn Ala Lys 
755 760 765 

Lys Glu Leu Thr Lys Met Lys Asn Glu Tyr Asp Asp Leu Lys Lys Ala 
770 775 780 

Ala Glu Lys Ser Thr Glu Ala Ala Lys Leu Leu Leu Ser Arg Pro Arg 
785 790 795 800 

Gin Thr Val Met Pro Asn Ala Gin Asn Gly Ser Asp Ser Thr Leu Val 
805 810 815 

Pro Pro Pro Pro Gin Ala Pro Ala Gly Pro Pro Pro Pro Gly Ser Pro 
820 825 830 

Pro Pro Pro Pro Ser Gin Asn Gly Thr Pro Gly Thr Pro Gly Gly Glu 
835 840 845 

Thr Gly Ala Ser Gly Gly Thr Pro Gly Thr Pro Gly Thr Pro Gly Thr 
850 655 860 

Pro Gly Thr Pro Gly Gly Met Met Lys Tyr Ala Lys Leu Gly Leu Val 
865 870 875 880 

Lys Arg Thr Tyr Val Asp Gly Gly Val Ser Glu Val Glu Val Lys Ma 
885 890 895 

Phe Asp Ala Thr Thr lie Ala Leu Glu Leu Tyr Leu Glu Leu Lys Glu 
900 905 910 

Glu Cys Lys Ala Leu Glu Leu Asp Cys Gly Phe Lys Glu Asp Cys Pro 
915 920 925 

Asp Thr Lys Gin Ala Cys Glu Asn lie Asp Thr Leu Cys Lys Leu Glu 
930 935 940 

Pro Leu Glu lie Lys Pro His His Thr Glu Lys lie Thr Glu Thr Lys 
945 950 955 960 

Thr Glu Thr Lys Thr Glu Thr Lys Thr Glu Thr Lys Thr Asp Gly Lys 
965 970 975 

Ala Asp Glu Lys Thr Val Glu Lys Thr Val Thr Glu Thr Lys Ser Val 
980 985 990 

Gly Gly Gly Lys Val Thr Glu Glu Cys Thr Met lie Gin Thr Thr Asp 
995 1000 1005 

Thr Trp Val Thr Ser Thr Ser Leu His Thr Ser Thr Thr Thr Ser Thr 
1010 1015 1020 

Ser Thr Val Thr Ser Thr Val Thr Leu Thr Ser Met Arg Lys Cys Lys 
1025 1030 1035 1040 

Pro Thr Lys Cys Thr Thr Asp Ser Ser Lys Glu Thr Gin Lys Glu Glu 
1045 1050 1055 
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Asp Asp Glu Glu Val Lys Pro Asn Glu Gly Met Lys lie Arg Val Pro 
1060 1065 1070 

Asp Met He Lys He Met Leu Leu Gly Val He Val Met Gly Met Met 
1075 1080 1085 



(2) INFORMATION. FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1. .20 

(D) OTHER INFORMATION: /label= oligonucleotide 
/note= "primer JK58" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
TTAACCGGCC GTGCCATTGC 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1. .35 

(D) OTHER INFORMATION: /label= oligonucleotide 
/note= "primer ANC" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
GACTGCATGC GGAAGCTTGG ATCCCCCCCC CCCCCC 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/ KEY: - 

(B) LOCATION: 1..24 
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(D) OTHER INFORMATION: /label= oligonucleotide 
/note= "primer AN" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GACTGCATGC GGAAGCTTGG ATCC 
(2) INFORMATION FOR SEQ ID NO:13: 

( i ) SEQUENCE CHARACTERISTICS ; 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..20 

(D) OTHER INFORMATION: /label= oligonucleotide 

/note= "5' primer corresponding to first 20 bases 
of GP3 mRNA" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
TTTTTCTAAT AGACGATATG 20 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..20 

(D) OTHER INFORMATION: /label= oligonucleotide 

/note= "3' primer corresponding to positions 77-96 
of GP3" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
GATCTCCACA TGTTTTAGCA 20 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME /KEY: 
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(B) LOCATION: 1..30 

(D) OTHER INFORMATION: /label= oligonucleotide 
/note= "hybridization probe MSGl" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

GCAGAACTTG AGTCGGAATG TTTYTATTTA 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 30 base pairs 
(B> TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..30 

(D) OTHER INFORMATION: /label* oligonucleotide 
/note= "hybridization probe MSG2 " 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
AAAATATCTT CCACGATGTC TTTATCCTAA 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..30 

(D) OTHER INFORMATION: /label= oligonucleotide 
/note« "hybridization probe MSG 3 " 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GAAAATAAAG ATAAGAGATA CCTTCCAAAG 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1..30 
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(D) OTHER INFORMATION: /label= oligonucleotide 
/note= "hybridization probe DHPS1" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
TTGATCACGA TATTAAGCCA GTTTTGCCAT 3C 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE : amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: internal 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Glu Leu Lys Gly Lys Leu Gly His Val Arg Phe Tyr Ser Asp Pro 
1 5 10 15 
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What is claimed is: 

1. ■ A DNA molecule encoding a mammalian Pneumocystis 

carinii major surface glycoprotein or an allelic 
variation thereof. 

2. The DNA molecule according to claim 1 where the 
mammal is a rat. 

3. A DNA molecule encoding the gene for the major 
surface glycoprotein of P. carinii as shown in SEQ 
ID NO: 8. 

4. A DNA molecule encoding a portion of the gene for 
the major surface glycoprotein of P. carinii in a 
cDNA selected from the group consisting of SEQ ID 
NO: 1, SEQ ID NO: 2, SEQ ID NO : 3, SEQ ID NO: 4, 
SEQ ID NO: 5 f SEQ ID NO: 6 and SEQ ID NO: 7. 

5. A DNA molecule according to claim 1 where the 
mammal is a human. 

6 . A DNA molecule encoding a mammalian Pneumocystis 
carinii major surface glycoprotein which is a 
composite of a multiple gena family or is a 
synthetic construction representing a consensus 
sequence analysis of a multiple gene family. 
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A method of obtaining a DNA molecule encoding a 
mammalian P. carinii major surface glycoprotein 
which comprises screening a cDNA library, of JL_ 
carinii with an antibody to said major surface 
glycoprotein to identify positive clones encoding 
for gp!16 and using at least one of said clones or 
an oligonucleotide probe based on said clones to 
reveal the presence of multiple genes encoding for 
said major surface glycoprotein. 

A mammalian Pneumocys tis carinii major surface 
glycoprotein having the amino acid sequence as 
shown in SEQ ID NO: 9. 

A mammalian Pneumocystis carinii major surface 
glycoprotein produced from the expression of a DNA 
sequence which is a composite of a multiple gene 
family encoding for said major surface glycoprotein 
or a synthetic construction representing a 
consensus sequence analysis of a multiple gene 
family. 

The major surface glycoprotein according to claim 
8 where the mammal is human. 

A mammalian Pneumocvst - 1 « carinii major surface 
protein having a molecular weight of about 122977 
or allelic variations thereof. 

A vaccine comprising a therapeutically effective 
amount of a mammaian Pneumocystis carinii major 
surface glycoprotein or a polypeptide derived 
therefrom capable of eliciting an immune response 
to said glycoprotein, and a pharmaceutical ly 
acceptable parenteral vehicle. 



WO 94/09141 



1/6 



PCT/US93/09635 



OlONOOOO 
C\ r-t <7\ 



comooooo eoooooooo m> vd o o o o o ^ « r- r- o r- tn 
SSS mom ^iCS 

r-, ^ ^ MfMOJ V*-tV 



o) w m n d vd «h 
v «/> m n o> n 



S 



Sw w 

Q Q D 
O O O 

sss 

J a a 
J .4 .4 
WWW 

i 

M H 

J J u lj 

oooo 
w w w w 

CC Of oc u 



o o o 

V} to to 
« ad 
o o o 



* o o o 

8SS 

* JJJ 
+ 000 

M M t-H 

♦www 

* 0:0:0; 

* »a o J 

* JJJ 

o* 0* 0. 

+ 0. 0* 0* 
CO CO co 
U. b. J 

.QQQ 

* CO CO 10 

* W Id W 

* a 0. 0* 

* 335 

(b bu H 

* ass 

* 000 

•§gg 




I X, jK „ „ 

I ♦ >* >4 >• > 

1 ♦ u u 



* * w w « o 

t * O Q Q O 

I . M W W M 



w « : 
22! 



*S6&6 

1 u u o o 

t UHUU 

t * OOOU 

1 I w w w w 

l * o>co CO CO 

1 * w w w > 

1 * j j j j 

1 * WW WW 

• K<<< 
t uuu 
1 www 

1 * o © o 




1 * W trf M 
1 hhh 



:,93S 



o* 
a* * 

M 

£: 

CO * 



wwwwwww* 

Q O Q 5- >« >* Q 
WW CO u w w o 

Q Q < Q O Q 
UUOOUUU* 

wuwwwww* 

S> * *; * o 
j j j j j J ♦ 
wwwwwww* 
a J > J > > > 
uuouuuuu* 

QQQQQXZQ 
UUUttUUWW 

J J J J 

0i0i0i0i0i0iO*6U + 
U,b.b*b*b.k.b*b** 

gggggggg: 
SSSSSSSS: 

> cowco<><<> 

1 C0C0C0C0C2QOC0 

-cocococowcoco* 

:"SSSSSSS: 

• A) l>« Ui U» b* b* b* b* ♦ 
t ^JMMMMWMW* 

jjSSSdqqq 

Otto co co co co to 2 ♦ 



O»0iO*0i0(0*0i + 

>*>«>«>«>«>«>** 

uuouuuo* 

CO CO % Of o« <*• « 

fisBBBgg 

« « • 1 g g ' 



1 8 * u: « « w 




r r r > * 

r CO ^* l»* CO CO CO 

k « « w « « ^ 

Q bl Q « bl H CO 

t2 w 2 o w w w 

CU Ot Cu O* 0* 0» 1 

1 t 1 2IIH 
DQQOW 1 • 0 
CO CO CO CO D z g 21 

wwwwowww 
. i2 22w««w 
••0000:0:0:0c 

1 |X{btb J J J J 

a 



n in ro (N tt H H 



<N VO Tf -tf 

o u o* o* a* at u 
Cu Oj o o o o o* 



O.O.OUUOO. 



r« 10 ^* * 

o o o. o« o« o# a 
o. o. o o o o Oi 



a \o -? t 

n w n N ^ H H 
U O Ot CU O. O* U 
fcftOOOOO. 



u o o# o. a o« o 



FIGURE 1A 

SUBSTITUTE SHEET (RULE 26) 



WO 94/' 



2/6 



PCT/US93/09635 



O o\ OJ <Ti VO O O Oi ^ (N CJ r- CD 
u> « vo n h r> n vD^rM^Nnn 



ouooo 1 * £ S JSlfil 

»-4 tH M M M ♦ « t 2 ZlZUZt 

* td id td o • • 2 gzx 



o cn a\ c* r* o r- 
w> v eo -<t n <r ^ 



O cr\ Ch in (M O co ° ^ °> r! Z 

5 CO CM C» CO ^ n O 00 CM <N CD O 



J J > > J 

*-t M HI *-« + 

* ta * 
w w w td * 

2 2 Q Q 55 
2 Id Id Ed 2 



;ss ♦ 



■ 

to >* 

CO ffa tX4 In M 

u o u oo * 

, „ / / 5 
wwtuuo 
o u u o u + 

H H H H H + 

uuuuu« 

W td O O W 
p. o: 0* 0; Q 

o o o o o + • 

CO 10 TO CO TO ♦ 

> u, > > u. 

CO ID to TO W 
U* bu bu C>4 f* ♦ 
« 0t TO CO CU 

B8l_. 

****** 
to w td h n * 




• CO W W td td 

• u u o olu * 

i CC X 0£ 21 DC 

i * td * 2 a 

i UOWOQ 

i < H H H H 

• id * * td > 

i * * *; H x 
i uuobo* 

'.ASBIS 

i TO TO CO TO CO * 

i ooooo* 

i ***** ♦ 
i JUh1»4h1 ♦ 

i .J ^ ^ J .4 + 

t u w ill w m * 

• HUUUQ 

• www22 
„ _ « 2 w 

b« - 

jIHHHh* 

21 co to to F 
Do o u b + 
* » « « K * 

"i§r 



I J »J H . . . . 
I (U U* (X* fit Ur * 



_iSS9SS* 

tbCbfiutuUtChb**. 

IwtKEutwbEttb** 

WUWWWWW* 
H H TO TO TO TO H 

£ iw 2 ^ j j 

uouoouu* 
oitfoiostfgsoj* 
0:0:0:0:0:0:0$ + 
* * * o; os tn * 

HHHHHHH+ 

H M M *< M W IH 

uuooooo* 
******** 
www aw w W 

j a j z a x j 




QUQQp 
t Of 0t 0t CU 0* * 

1 uuooo* 

! § § § w w * 
:53355. 

1 ooooo« 

iggggg: 

|OOj 





1 1 ooooo* 
. i <<<o< 

1 t Z SS TO CO fr* 
1 1 CU 0* o» Ot 0» * 

I ! oollal 

1 • 06 oc o;^ o 
1 1 0* CU 0* CO to 

1 1 oi^osoiu: 

1 1 TO TO TO TO TO ♦ 

1 • ♦ 

1 i JX A* * 
1 1 2 ^ TO C0 TO 

: : ssggg* 

* 1 uu^zz 

1 1 HHHbp 

1 1 TO TO TO tO TO + 

1 • td W « M 

l 1 W ^ td U W ^ 

1 • td td td M td + 




i 1 2 2 2 td 2 

1 1 Id td 5 td S 



I CU Ot 01 CW 01 + 



1 UOOQU* 
1 • Of P£ CC Of «6 

• ■ «o2op 

! 

: ! 1SSSS: 

t » ,4 (4 J »3 M * 

* » g g g « w 

* 1 H td Q CO * 

• 1 

1 1 O td td 2 O 

1 1 Q W O ■ Id 

• 1 U O O U U + 
1 1 « « Q 2 

1 1 O O CO CO 

, 1 >>*a 

1 • H H 55 h m 
t 1 id « 2 + 
1 1 {x* fx* U* b* U* * 



1 O TO td O (d 

• o. CO ^ 0* » 

• uuuuu* 

1 U* fcu tb Ui Cu ♦ 
» O o © o o ♦ 

1 QQDQO + 
I 3333^1 + 

, <r < 4 + 

t 2 TO 2 TO z 

1 ooouu* 

1 wu www* 

• td td Id Id td * 

1 J J J JJ* 
1 W W M W W ♦ 

; :S5sSs: 

t 1 td td td td W ♦ 
, , m H M M > 

1 (iulLh. bTbj ♦ 



: 

« 1 TO CO TO TO TO # 

• 1 H H H H H * 

1 « to to to to TO * 

■\mm 

1 1 CO TO TO TO TO ♦ 
, * H H H H H + 
. . StKXX ♦ 

- » j u a a ^1 ♦ 

1 TO CO TO TO TO + 

1 HHHHH* 

1 tO PS TO TO Ot 

• HHf 



• OQQOQ* 

! g ^ g 3 ^ 

1 td W td td td ♦ 
td td td td + 

OMOO* 

8OO0* 

• fdfd! 




O 

. ^» Ot » 1 « 

• oopgg 

• 0* < H H Q 
1 HWHH p 
1 OtfOOH 
l Oi to to O O 

• H 0* TO ^ TO 
1 OHQiHO 
1 ft^hpO 
t HOOM 

• o 0* 0* fc 

1 (oSSDQP 

I «t to o % > 

1 OHPtrtK 

1 HMHOP 

i W 1 •< Oi SS 

1 O * 0* MH 

1 o * Ht5| w 

1 H ■ z]o td 

1 O • O TO 1 

0t t 0t • 

) fx, 0* I 

) < to • 

CO TO t 

OO0i 0* * 

1 co O 0* TO 1 

1 Cu < O U « 

1 a* Ot 0i 0i < 

l Oi CU 01 < to 
I 0i 0t TO TO 

t o* cu to CU 0* 
1 co O 0* 0* 0* 

1 e> < to o to 



< HHHH>» 

1 to £ £ td td 

I M t-l M M O 

1 td Cd td td td + 



l Pi X 

1 HOI 

1 Ol W 4 

• ZlCU ( 



I 1 CU CU CU 0. CU + 
t • 01 01 < < TO 

iii • * > Q 

: :g8Sbg. 

1 1 H 5 ^ td td 
1 i Q Q Id td td 
, , m M W W W * 

1 1 OO 0* O>0* 

1 1 ui w w 

• i H > H H TO 



O A CO 9 M lA 

o co 00 co *«r w> o\ 
u> ^ o u> u> u> u> 



I t M M M M M ♦ 
t •>>>>>* 

: :sssss: 

I I M M M M M + 

1 * + 

I I M M M M M * 

1 1 + 

, 1 a0*0j0j0*« 
1 1 >>>>>* 

1 1 OS Oi Ol Ofi OS + 
I I MHHHH* 

! ! O tf 5 O O ♦ 

I * td td td g Q 

t t CU 0* 0* CU Ot ♦ 

( • x u; « s<i »4 + 
it •>>>>>* 

u 1 td td O td td 

II 1 td td td td td + 
>t 1 Q Q Id Q td 

1 t O O O td td 
1 1 td Id W W * 
1 1 t t O « O 

1 1 td td td td td + 

t 1 i^XO!^ 

1 i TO H CO H TO 

t 1 TO CO CO 10 CO + 

: : eeeeg: 

; : BScey : 
, 1 ****** 

1 1 H H H H H ♦ 

I • 0i 0t Oi 0t Oj * 
l 1 ***** ♦ 

1 1 ouuoo + 

I i * * * * ^ * 

1 1 a: a: oc oc oc + 

1 1 sszzz* 

1 l tO TO C0 tO TO + 

1 1 H H H H H + 

I I to to co to to + 



in wncj triri 
(J u cu CU CU CU O 
Cu CU O O O O 0* 



n in n oj f H h 
O O CU 0» 0* 0< U 

cu cu o O O o cu 



w to ^ ^ 

u u cu cu cu cu u 

CU CU O O C5 O 0. 



(N VD 

r» in o r» v h h 
O U Cu CU cu a* u 



, , ift r» « * h H 

U U CU 04 01 CU U 



Cu CU O O O O CU CU Ot O O O O 0i 



n \d *r *r 
t> U 0» CU O* CU U 

0* CU u o o o 0* 



FIGURE IB 

SUBSTITUTE SHEET (RULE 26) 
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